US20040154010A1 - Control-quasi-independent-points guided speculative multithreading - Google Patents

Control-quasi-independent-points guided speculative multithreading Download PDF

Info

Publication number
US20040154010A1
US20040154010A1 US10/356,435 US35643503A US2004154010A1 US 20040154010 A1 US20040154010 A1 US 20040154010A1 US 35643503 A US35643503 A US 35643503A US 2004154010 A1 US2004154010 A1 US 2004154010A1
Authority
US
United States
Prior art keywords
instructions
speculative
thread
spawning
speculative thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/356,435
Inventor
Pedro Marcuello
Antonio Gonzalez
Hong Wang
John Shen
Per Hammarlund
Gerolf Hoflehner
Perry Wang
Steve Liao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tahoe Research Ltd
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US10/356,435 priority Critical patent/US20040154010A1/en
Priority to US10/423,633 priority patent/US7814469B2/en
Priority to US10/422,528 priority patent/US7523465B2/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HOFLEHNER, GEROLF F., LIAO, STEVE SHIH-WEI, SHEN, JOHN P., WANG, HONG, WANG, PERRY H., GONZALEZ, ANTONIO, MARCUELLO, PEDRO, HAMMARLUND, PER
Priority to US10/633,012 priority patent/US7657880B2/en
Priority to CNB2003101215924A priority patent/CN1302384C/en
Publication of US20040154010A1 publication Critical patent/US20040154010A1/en
Priority to US12/879,898 priority patent/US8719806B2/en
Assigned to TAHOE RESEARCH, LTD. reassignment TAHOE RESEARCH, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTEL CORPORATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/3009Thread control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • G06F9/3832Value prediction for operands; operand history buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis

Definitions

  • the present invention relates generally to information processing systems and, more specifically, to spawning of speculative threads for speculative multithreading.
  • multithreading an instruction stream is split into multiple instruction streams that can be executed in parallel.
  • software-only multithreading approaches such as time-multiplex multithreading or switch-on-event multithreading, the multiple instruction streams are alternatively executed on the same shared processor.
  • processors in a multi-processor system such as a chip multiprocessor (“CMP”) system, may each act on one of the multiple threads simultaneously.
  • simultaneous multithreading a single physical processor is made to appear as multiple logical processors to operating systems and user programs. That is, each logical processor maintains a complete set of the architecture state, but nearly all other resources of the physical processor, such as caches, execution units, branch predictors control logic and buses are shared. The threads execute simultaneously and make better use of shared resources than time-multiplex multithreading or switch-on-event multithreading.
  • one or more threads may be idle during execution of a single-threaded application.
  • Utilizing otherwise idle threads to speculatively parallelize the single-threaded application can increase speed of execution, but it is often-times difficult to determine which sections of the single-threaded application should be speculatively executed by the otherwise idle thread.
  • Speculative thread execution of a portion of code is only beneficial if the application's control-flow ultimately reaches that portion of code.
  • speculative thread execution can be delayed, and rendered less effective, due to latencies associated with data fetching.
  • FIG. 1 is a flowchart illustrating at least one embodiment of a method for generating instructions for control-quasi-independent-points guided speculative multithreading.
  • FIG. 2 is a flowchart illustrating at least one embodiment of a method for identifying control-quasi-independent-points for speculative multithreading.
  • FIG. 3 is a data flow diagram showing at least one embodiment of a method for generating instructions for control-quasi-independent-points guided speculative multi threading.
  • FIG. 4 is a flowchart illustrating at least one embodiment of a software compilation process.
  • FIG. 5 is a flowchart illustrating at least one embodiment of a method for generating instructions to precompute speculative-thread's live-in values for control-quasi-independent-points guided speculative multithreading.
  • FIGS. 6 and 7 are flowcharts illustrating at least one embodiment of a method for performing speculative multithreading using a combination of control-quasi-independent-points guided speculative multithreading and speculative precomputation of live-in values.
  • FIG. 8 is a block diagram of a processing system capable of performing at least one embodiment of control-quasi-independent-points guided speculative multithreading.
  • FIG. 1 is a flowchart illustrating at least one embodiment of a method for generating instructions to facilitate control-quasi-independent-points (“CQIP”) guided speculative multithreading.
  • CQIP control-quasi-independent-points
  • instructions are generated to reduce the execution time in a single-threaded application through the use of one or more simultaneous speculative threads.
  • the method 100 thus facilitates the parallelization of a portion of an application's code through the use of the simultaneous speculative threads.
  • a speculative thread referred to as the spawnee thread, executes instructions that are ahead of the code being executed by the thread that performed the spawn.
  • the thread that performed the spawn is referred to as the spawner thread.
  • the spawnee thread is an SMT thread that is executed by a second logical processor on the same physical processor as the spawner thread.
  • the method 100 may be utilized in any multithreading approach, including SMT, CMP multithreading or other multiprocessor multithreading, or any other known multithreading approach that may encounter idle thread contexts.
  • the method 100 of FIG. 1 determines spawn points based on control independency, yet makes provision for handling data flow dependency among parallel threads.
  • the following discussion explains that the method 100 selects thread spawning points based on an analysis of control independence, in an effort to achieve speculative parallelization with minimal misspecualtion in relation to control flow.
  • the method addresses data flow dependency in that live-in values are supplied. For at least one embodiment, live-in values are predicted using a value prediction approach. In at least one other embodiment, live-in values are pre-computed using speculative precomputation based on backward dependency analysis.
  • FIG. 1 illustrates that a method 100 for generating instructions to facilitate CQIP-guided multithreading includes identification 10 of spawning pairs that each include a spawn point and a CQIP.
  • the method 100 provides for calculation of live-in values for data dependences in the helper thread to be spawned.
  • instructions are generated such that, when the instructions are executed by a processor, a speculative thread is spawned and speculatively executes a selected portion of the application's code.
  • FIG. 2 is a flowchart further illustrating at least one embodiment of identification 10 of control-quasi-independent-points for speculative multithreading.
  • FIG. 2 illustrates that the method 10 performs 210 profile analysis.
  • a control flow graph (see, e.g., 330 of FIG. 3) is generated to represent flow of control among the basic blocks associated with the application.
  • the method 10 then computes 220 reaching probabilities. That is, the method 10 computes 220 the probability that a second basic block will be reached during execution of the source program, if a first basic block is executed.
  • Candidate basic blocks are identified 230 as potential spawn pairs based on the reaching probabilities previously computed 220 .
  • the candidates are evaluated according to selected metrics in order to select one or more spawning pairs.
  • Each of blocks 210 (performing profile analysis), 220 (computing reaching probabilities), 230 (identifying candidate basic blocks), and 240 (selecting spawning pair) are described in further detail below in connection with FIG. 3.
  • FIG. 3 is a data flow diagram. The flow of data is represented in relation to an expanded flowchart that incorporates the actions illustrated in both FIGS. 1 and 2.
  • FIG. 3 illustrates that, for at least one embodiment of the method 100 illustrated in FIG. 1, certain data is consulted, and certain other data is generated, during execution of the method 100 .
  • FIG. 3 illustrates that a profile 325 is accessed to aid in profile analysis 210 .
  • a control flow graph 330 (“CFG”) is accessed to aid in computation 220 of reaching probabilities.
  • FIG. 4 illustrates that the profile 325 is typically generated by one or more compilation passes prior to execution of the method.
  • a typical compilation process 400 is represented.
  • the process 400 involves two compiler-performed passes 405 , 410 and also involves a test run 407 that is typically initiated by a user, such as a software programmer.
  • the compiler e.g., 808 in FIG. 8
  • the compiler then generates instrumented binary code 420 that corresponds to the source code 415 .
  • the instrumented binary code 420 includes, in addition to the binary for the source code 415 instructions, extra binary code that causes, during a run of the instrumented code 420 , statistics to be collected and recorded in a profile 325 and a call graph 424 .
  • the profile 325 and call graph 424 are generated.
  • the profile 325 is used as an input into the compiler and a binary code file 340 is generated.
  • the profile 325 may be used, for example, by the compiler during the normal compilation pass 410 to aid with performance enhancements such as speculative branch prediction.
  • each of the passes 405 , 410 , and the test run 407 are optional to the method 100 in that any method of generating the information represented by profile 325 may be utilized. Accordingly, first pass 405 and normal pass 410 , as well as test run 407 , are depicted with broken lines in FIG. 4 to indicate their optional nature.
  • any method of generating the information represented by profile 325 may be utilized, and that the actions 405 , 407 , 410 depicted in FIG. 4 are provided for illustrative purposes only.
  • the method 100 described herein may be applied, in an alternative embodiment, to a binary file. That is, the profile 325 may be generated for a binary file rather than a high-level source code file, and the profile analysis 210 (FIG. 2) may be performed using such binary-based profile as an input.
  • the profile analysis 210 utilizes the profile 325 as an input and generates a control flow graph 330 as an output.
  • the method 100 builds the CFG 330 during the profile analysis 210 such that each node of the CFG 330 represents a basic block of the source program. Edges between nodes of the CFG 330 represent possible control flows among the basic blocks. For at least one embodiment, edges of the CFG 330 are weighted with the frequency that the corresponding control flow has been followed (as reflected in the profile 325 ). Accordingly, the edges are weighted by the probability that one basic block follows the other, without revisiting the latter node. In contrast to other CFG representations, such as “edge profiling” which represents only intra-procedural edges, at least one embodiment of the CFG 330 created during profile analysis 210 includes representation of inter-procedural edges.
  • the CFG 330 is pruned to simplify the CFG 330 and control its size.
  • the least frequently executed basic blocks are pruned from the CFG 330 .
  • the weights of the edges to a block are used to determine the basic block's execution count.
  • the basic blocks are ordered by execution count, and are selected to remain in the CFG 330 according to their execution count.
  • the basic blocks are chosen from highest to lower execution count until a predetermined threshold percentage of the total executed instructions are included in the CFG 330 . Accordingly, after weighting and pruning, the most frequently-executed basic blocks are represented in the CFG 330 .
  • the predetermined threshold percentage of executed instructions chosen to remain in the CFG 330 during profile analysis 20 is ninety (90) percent.
  • the threshold may be varied to numbers higher or lower than ninety percent, based on factors such as application requirements and/or machine resource availability. For instance, if a relatively large number of hardware thread contexts are supported by the machine resources, then a lower threshold may be chosen in order to facilitate more aggressive speculation.
  • an edge from a predecessor to the pruned node is transformed to one or more edges from that predecessor to the node's successor(s).
  • an edge from the pruned node to a successor is transformed to one or more edges from the pruned node's predecessor(s) to the successor. If, during this transformation, an edge is transformed into multiple edges, the weight of the original edge is proportionally apportioned across the new edges.
  • FIG. 3 illustrates that the CFG 330 produced during profile analysis 210 is utilized to compute 220 reaching probabilities.
  • reaching probability computation 220 utilizes the profile CFG 330 as an input and generates a reaching probability matrix 335 as an output.
  • the “reaching probability” is the probability that a second basic block will be reached after execution of a first basic block, without revisiting the first basic block.
  • the reaching probabilities computed at block 220 are stored in a two-dimensional square matrix 335 that has as many rows and columns as nodes in the CFG 330 . Each element of the matrix represents the probability to execute the basic block represented by the column after execution of the basic block represented by the row.
  • this probability is computed as the sum of the frequencies for all the various sequences of basic blocks that exist from the source node to the destination node.
  • a constraint is imposed such that the source and destination nodes may only appear once in the sequence of nodes as the first and last nodes, respectively, and may not appear again as intermediate nodes. (For determining the probability of reaching a basic block again after it has been executed, the basic block will appear twice—as both the source and destination nodes). Other basic blocks are permitted to appear more than once in the sequence.
  • the reaching probability matrix 335 is traversed to evaluate pairs of basic blocks and identify those that are candidates for a spawning pair.
  • spawning pair refers to a pair of instructions associated with the source program.
  • One of the instructions is a spawn point, which is an instruction within a first basic block.
  • the spawn point is the first instruction of the first basic block.
  • the other instruction is a target point and is, more specifically, a control quasi-independent point (“CQIP”).
  • CQIP is an instruction within a second basic block.
  • the CQIP is the first instruction of the second basic block.
  • a spawn point is the instruction in the source program that, when reached, will activate creation of a speculative thread at the CQIP, where the speculative thread will start its execution.
  • the first block includes a potential spawn point
  • the second block includes a potential CQIP.
  • An instruction (such as the first instruction) of the basic block for the row is the potential spawn point.
  • An instruction (such as the first instruction) of the basic block for the column is the potential CQIP.
  • Each element of the reaching probability matrix 335 is evaluated, and those elements that satisfy certain selection criteria are chosen as candidates for spawning pairs. For at least one embodiment, the elements are evaluated to determine those pairs whose probability is higher than a certain predetermined threshold; that is, the probability to reach the control quasi-independent point after execution of the spawn point is higher than a given threshold.
  • This criterion is designed to minimize spawning of speculative threads that are not executed.
  • a pair of basic blocks associated with an element of the reaching probability matrix 335 is considered as a candidate for a spawning pair if its reaching probability is higher than 0.95
  • a second criterion for selection of a candidate spawning pair is the average number of instructions between the spawn point and the CQIP. Ideally, a minimum average number of instructions should exist between the spawning point and the CQIP in order to reduce the relative overhead of thread creation. If the distance is too small, the overhead of thread creation may outweigh the benefit of run-ahead execution because the speculative thread will not run far enough ahead. For at least one embodiment, a pair of basic blocks associated with an element of the reaching probability matrix 335 is considered as a candidate for a spawning pair if the average number of instructions between then is greater than 32 instructions.
  • Distance between the basic blocks may be additionally stored in the matrix 335 and considered in the identification 230 of spawning pair candidates. For at least one embodiment, this additional information may be calculated during profile analysis 210 and included in each element of the reaching probability matrix 335 . The average may be calculated as the sum of the number of instructions executed by each sequence of basic blocks, multiplied by their frequency.
  • the spawning pair candidates are evaluated based on analysis of one or more selected metrics. These metrics may be prioritized. Based on the evaluation of the candidate spawning pairs in relation to the prioritized metrics, one or more spawning pairs are selected.
  • the metrics utilized at block 240 may include the minimum average distance between the basic blocks of the potential spawning pair (described above), as well as an evaluation of mispredicted branches, load misses and/or instruction cache misses.
  • the metrics may also include additional considerations.
  • One such additional consideration is the maximum average distance between the basic blocks of the potential spawning pair. It should be noted that there are also potential performance penalties involved with having the average number of instructions between the spawn point and CQIP be too large. Accordingly, the selection of spawning pairs may also impose a maximum average distance. If the distance between the pair is too large, the speculative thread may incur stalls in a scheme where the speculative thread has limited storage for speculative values.
  • speculative threads may incur stalls in a scheme where the speculative thread cannot commit its states until it becomes the non-speculative thread (see discussion of “join point” in connection with FIGS. 6 and 7, below). Such stalls are likely to result in ineffective holding of critical resources that otherwise would be used by non-speculative threads to make forward progress.
  • the speculative thread includes in relation to the application code between the spawning point and the CQIP.
  • the average number of speculative thread instructions dependent on values generated by a previous thread should be relatively low.
  • a smaller number of dependent instructions allow for more timely computation of the live-in values for the speculative thread.
  • a relatively high number of the live-in values for the speculative thread are value-predictable.
  • value-predictability of the live-in values facilitates faster communication of live-in values, thus minimizing overhead of spawning while also allowing correctness and accuracy of speculative thread computation.
  • the candidate spawning pairs identified at block 230 may include several good candidates for CQIP's associated with a given spawn point. That is, for a given row of the reaching probability matrix 335 , more than one element may be selected as a candidate spawning pair. In such case, during the metrics evaluation at block 240 , the best CQIP for the spawn point is selected because, for a given spawn point, a speculative thread will be spawned at only one CQIP. In order to choose the best CQIP for a given spawn point, the potential CQIP's identified at block 230 are prioritized according to the expected benefit.
  • more than one CQIP can be chosen for a corresponding spawn point.
  • multiple concurrent, albeit mutually exclusive, speculative threads may be spawned and executed simultaneously to perform “eager” execution of speculative threads.
  • the spawning condition for these multiple CQIPs can be examined and verified, after the speculative threads have been executed, to determine the effectiveness of the speculation. If one of these multiple speculative threads proves to be good speculation, and another bad, then the results of the former can be reused by the main thread while the results of the latter may be discarded.
  • At least one embodiment of the method 100 selects 240 CALL return point pairs (pairs of subroutine calls and the return points) if they satisfy the minimum size constraint. These pairs might not otherwise be selected at block 240 because the reaching probability for such pairs is sometimes too low to satisfy the selection criteria discussed above in connection with candidate identification 230 .
  • a subroutine is called from multiple locations, it will have multiple predecessors and multiple successors in the CFG 330 . If all the calls are executed a similar number of times, the reaching probability of any return point pair will be low since the graph 330 will have multiple paths with similar weights.
  • the method 100 provides for calculation of live-in values for the speculative thread to be executed at the CQIP.
  • provide for it is meant that instructions are generated, wherein execution of the generated instructions, possibly in conjunction with some special hardware support, will result in calculation of a predicted live-value to be used as an input by the spawnee thread.
  • block 50 might determine that no live-in values are necessary. In such case, “providing for” calculation of live-in values simply entails determining that no live-in values are necessary.
  • Predicting thread input values allows the processor to execute speculative threads as if they were independent.
  • At least one embodiment of block 50 generates instructions to perform or trigger value prediction. Any known manner of value prediction, including hardware value prediction, may be implemented. For example, instructions may be generated 50 such that the register values of the spawned thread are predicted to be the same as those of the spawning thread at spawn time.
  • Another embodiment of the method 100 identifies, at block 50 , a slice of instructions from the application's code that may be used for speculative precomputation of one or more live-in values. While value prediction is a promising approach, it often requires rather complex hardware support. In contrast, no additional hardware support is necessary for speculative precomputation. Speculative precomputation can be performed at the beginning of the speculative thread execution in an otherwise idle thread context, providing the advantage of minimizing misspeculations of live-in values without requiring additional value prediction hardware support. Speculative precomputation is discussed in further detail below in connection with FIG. 5.
  • FIG. 5 illustrates an embodiment of the method 100 wherein block 50 is further specified to identify 502 precomputation instructions to be used for speculative precomputation of one or more live-in values.
  • a set of instructions called a slice
  • the slice is computed at block 502 to include only those instructions identified from the original application code that are necessary to compute the live-in value.
  • the slice therefore is a subset of instructions from the original application code.
  • the slice is computed by following the dependence edges backward from the instruction including the live-in value until all instructions necessary for calculation of the live-in value have been identified.
  • a copy of the identified slice instructions is generated for insertion 60 into an enhanced binary file 350 (FIG. 3).
  • FIGS. 3 and 5 illustrate that the methods 100 , 500 for generating instructions for CQIP-guided multithreading generate an enhanced binary file 350 at block 60 .
  • the enhanced binary file 350 includes the binary code 340 for the original single-threaded application, as well as additional instructions.
  • a trigger instruction to cause the speculative thread to be spawned is inserted into the enhanced binary file 350 at the spawning point (s) selected at block 240 .
  • the trigger instruction can be a conventional instruction in the existing instruction set of a processor, denoted with special marks. Alternatively, the trigger instruction can be a special instruction such as a fork or spawn instruction. Trigger instructions can be executed by any thread.
  • the instructions to be performed by the speculative thread are included in the enhanced binary file 350 .
  • These instructions may include instructions added to the original code binary file 340 for live-in calculation, and also some instructions already in the original code binary file 340 , beginning at the CQIP, that the speculative thread is to execute. That is, regarding the speculative-thread instructions in the enhanced binary file 350 , two groups of instructions may be distinguished for each spawning pair, if the speculative thread is to perform speculative precomputation for live-in values. In contrast, for a speculative thread that is to use utilize value prediction for its live-in values, only the latter group of instructions described immediately below appears in the enhanced binary file 350 .
  • the first group of instructions are generated at block 50 (or 502 , see FIG. 5) and are incorporated 60 into the enhanced binary code file 350 in order to provide for the speculative thread's calculation of live-in values.
  • the instructions to be performed by the speculative thread to pre-compute live-in values are appended at the end of the file 350 , after those instructions associated with the original code binary file 340 .
  • Such instructions do not appear for speculative threads that use value prediction. Instead, specialized value prediction hardware may be used for value prediction.
  • the value prediction hardware is fired by the spawn instruction. When the processor executes a spawn instruction, the hardware initializes the speculative thread registers with the predicted live-in value.
  • the speculative thread is associated with the second group of instructions alluded to above.
  • the second set of instructions are instructions that already exist in the original code binary file 340 .
  • the subset of such instructions that are associated with the speculative thread are those instructions in the original code binary file 340 starting at the CQIP.
  • the precomputation slice (which may be appended at the end of the enhanced binary file) terminates with a branch to the corresponding CQIP, which causes the speculative thread to begin executing the application code instructions at the CQIP.
  • the spawnee thread begins execution of the application code instructions beginning at the CQIP.
  • the enhanced binary file 350 includes, for the speculative thread, a copy of the relevant subset of instructions from the original application, rather than providing for the speculative thread to branch to the CQIP instruction of the original code.
  • the inventors have found the non-copy approach discussed in the immediate preceding paragraph, which is implemented with appropriate branch instructions, efficiently allows for reduced code size.
  • method 100 is performed by a compiler 808 (FIG. 8).
  • the method 100 represents an automated process in which a compiler identifies a spawn point and an associated control-quasi-independent point (“CQIP”) target for a speculative thread, generates the instructions to pre-compute its live-ins, and embeds a trigger at the spawn point in the binary.
  • the pre-computation instructions for the speculative thread are incorporated (such as, for example, by appending) into an enhanced binary file 350 .
  • the method 100 may be performed manually such that one or more of 1) identifying CQIP spawning pairs 10 , 2) providing for calculation of live-in values 50 , and 3) modification of the main thread binary 60 may be performed interactively with human intervention.
  • FIGS. 6 and 7 illustrate at least one embodiment of a method 600 for performing speculative multithreading using a combination of control-quasi-independent-points guided speculative multithreading and speculative precomputation of live-in values.
  • the method 600 is performed by a processor (e.g. 804 of FIG. 8) executing the instructions in an enhanced binary code file (e.g., 350 of FIG. 3).
  • an enhanced binary code file e.g., 350 of FIG. 3
  • the enhanced binary code file has been generated according to the method illustrated in FIG. 5, such that instructions to perform speculative precomputation of live-in values have been identified 502 and inserted into the enhanced binary file.
  • FIGS. 6 and 7 illustrate that, during execution of the enhanced binary code file, multiple threads T 0 , T 1 , . . . T x may be executing simultaneously.
  • the flow of control associated with each of these multiple threads is indicated by the notations T 0 , T 1 , and T x on the edges between the blocks illustrated in FIGS. 6 and 7.
  • the multiple threads may be spawned from a non-speculative thread.
  • a speculative thread may spawn one or more additional non-speculative successor threads.
  • FIG. 6 illustrates that processing begins at 601 , where the thread T 0 begins execution.
  • a check is made to determine whether the thread T 0 previously encountered a join point while it (T 0 ) was still speculative. Block 602 is discussed in further detail below. One skilled in the art will understand that block 602 will, of course, evaluate to “false” if the thread T 0 was never previously speculative.
  • block 602 evaluates to “false”, then an instruction for the thread T 0 is executed at block 604 . If a trigger instruction associated with a spawn point is encountered 606 , then processing continues to block 608 . Otherwise, the thread T 0 continues execution at block 607 . At block 607 , it is determined whether a join point has been encountered in the thread T 0 . If neither a trigger instruction nor join point is encountered, then the thread T 0 continues to execute instructions 604 until it reaches 603 the end of its instructions.
  • a trigger instruction is detected at block 606 , then a speculative thread T 1 is spawned in a free thread context at block 608 . If slice instructions are encountered by the speculative thread T 1 at block 610 , the processing continues at block 612 . If not, then processing continues at 702 (FIG. 7).
  • slice instructions for speculative precomputation are iteratively executed until the speculative precomputation of the live-in value is complete 614 .
  • the spawner thread T 0 continues to execute 604 its instructions.
  • FIG. 6 illustrates that, while the speculative thread T 1 executes 612 the slice instructions, the spawner thread continues execution 604 of its instructions until another spawn point is encountered 606 , a join point is encountered 607 , or the instruction stream ends 603 . Accordingly, the spawner thread T 0 and the spawnee thread T 1 execute in parallel during speculative precomputation.
  • FIG. 7 illustrates that, at block 702 , the speculative thread T 1 executes instructions from the original code.
  • the CQIP instruction is executed.
  • the execution 702 of spawnee thread instructions is performed in parallel with the execution of the spawner thread code until a terminating condition is reached.
  • the speculative thread T 1 checks for a terminating condition.
  • the check 708 evaluates to “true” when the spawnee thread T 1 has encountered a CQIP of an active, more speculative thread or has encountered the end of the program. As long as neither condition is true, the spawnee thread T 1 proceeds to block 710 .
  • the speculative thread T 1 determines 708 that a join point has been reached, then it is theoretically ready to perform processing to switch thread contexts with the more speculative thread (as discussed below in connection with block 720 ). However, at least one embodiment of the method 600 limits such processing to non-speculative threads. Accordingly, when speculative thread T 1 determines 708 that it has reached the joint point of a more speculative, active thread, T 1 waits 706 to continue processing until it (T 1 ) becomes non-speculative.
  • the speculative thread T 1 determines whether a spawning point has been reached. If the 710 condition evaluates to “false”, then T 1 continues execution 702 of its instructions.
  • thread T 1 creates 712 a new speculative thread T 1 .
  • Thread T 1 then continues execution 702 of its instructions, while new speculative thread T 1 proceeds to continue speculative thread operation at block 610 , as described above in connection with speculative thread T 1 .
  • each thread follows the logic described above in connection with T 1 (blocks 610 through 614 and blocks 702 through 710 of FIGS. 6 and 7).
  • join point When the spawner thread T 0 reaches a CQIP of an active, more speculative thread, then we say that a join point has been encountered.
  • the join point of a thread is the control quasi-independent point at which an on-going speculative thread began execution. It should be understood that multiple speculative threads may be active at one time.
  • a “more speculative” thread is a thread that is a spawnee of the reference thread (in this case, thread T 0 ) and includes any subsequently-spawned speculative thread in the spawnee's spawning chain.
  • the join point check 607 evaluates to true when the thread T 0 reaches the CQIP at which any on-going speculative thread began execution.
  • the join point check 607 evaluates to true when the thread T 0 reaches the CQIP at which any on-going speculative thread began execution.
  • FIG. 7 assumes that, if multiple speculative threads are simultaneously active, then any one of the multiple CQIP's for the active speculative threads could be reached at block 607 .
  • FIG. 7 assumes that when T 0 hits a join point at bock 607 , the join point is associated with T 1 , the next thread in program order, which is the speculative thread whose CQIP has been reached by the non-speculative thread T 0 .
  • a thread T 0 Upon reaching the join point at block 607 (FIG. 6), a thread T 0 proceeds to block 703 .
  • the thread T 0 determines 703 if it is the non speculative active thread and, if not, waits until it becomes the non-speculative thread.
  • T 0 When T 0 becomes non-speculative, it initiates 704 a verification of the speculation performed by the spawnee thread T 1 .
  • verification 704 includes determining whether the speculative live-in values utilized by the spawnee thread T 1 reflect the actual values computed by the spawner thread.
  • Thread T 0 then proceeds to C (FIG. 6) to continue execution of its instructions. Otherwise, if the verification 704 succeeds, then thread T 0 and thread T 1 proceed to block 720 .
  • the thread context where the thread T 0 has been executing becomes free and is relinquished. Also, the speculative thread T 1 that started at the CQIP becomes the non-speculative thread and continues execution at C (FIG. 6).
  • Reference to FIG. 6 illustrates that the newly non-speculative thread T 0 checks at block 602 to determine whether it encountered a CQIP at block 708 (FIG. 6) while it was still speculative. If so, then the thread T 0 proceeds to B in order to begin join point processing as described above.
  • Embodiments of the method may be implemented in hardware, software, firmware, or a combination of such implementation approaches.
  • Embodiments of the invention may be implemented as computer programs executing on programmable systems comprising at least one processor, a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
  • Program code may be applied to input data to perform the functions described herein and generate output information.
  • the output information may be applied to one or more output devices, in known fashion.
  • a processing system includes any system that has a processor, such as, for example; a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • the programs may be implemented in a high level procedural or object oriented programming language to communicate with a processing system.
  • the programs may also be implemented in assembly or machine language, if desired.
  • the method described herein is not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language
  • the programs may be stored on a storage media or device (e.g., hard disk drive, floppy disk drive, read only memory (ROM), CD-ROM device, flash memory device, digital versatile disk (DVD), or other storage device) readable by a general or special purpose programmable processing system.
  • the instructions accessible to a processor in a processing system, provide for configuring and operating the processing system when the storage media or device is read by the processing system to perform the procedures described herein.
  • Embodiments of the invention may also be considered to be implemented as a machine-readable storage medium, configured for use with a processing system, where the storage medium so configured causes the processing system to operate in a specific and predefined manner to perform the functions described herein.
  • System 800 may be used, for example, to execute the processing for a method of performing control-quasi-independent-points guided speculative multithreading, such as the embodiments described herein.
  • System 800 may also execute enhanced binary files generated in accordance with at least one embodiment of the methods described herein.
  • System 800 is representative of processing systems based on the Pentium®, Pentium® Pro, Pentium® II, Pentium® III, Pentium® 4, and Itanium® and Itanium® II microprocessors available from Intel Corporation, although other systems (including personal computers (PCs) having other microprocessors, engineering workstations, set-top boxes and the like) may also be used.
  • sample system 800 may be executing a version of the WindowsTM operating system available from Microsoft Corporation, although other operating systems and graphical user interfaces, for example, may also be used.
  • processing system 800 includes a memory system 802 and a processor 804 .
  • Memory system 802 may store instructions 810 and data 812 for controlling the operation of the processor 804 .
  • instructions 810 may include a compiler program 808 that, when executed, causes the processor 804 to compile a program 415 (FIG. 4) that resides in the memory system 802 .
  • Memory 802 holds the program to be compiled, intermediate forms of the program, and a resulting compiled program.
  • the compiler program 808 includes instructions to select spawning pairs and generate instructions to implement CQIP-guided multithreading.
  • instructions 810 may also include an enhanced binary file 350 (FIG. 3) generated in accordance with at least one embodiment of the present invention.
  • Memory system 802 is intended as a generalized representation of memory and may include a variety of forms of memory, such as a hard drive, CD-ROM, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM) and related circuitry.
  • Memory system 802 may store instructions 810 and/or data 812 represented by data signals that may be executed by processor 804 .
  • the instructions 810 and/or data 812 may include code for performing any or all of the techniques discussed herein.
  • At least one embodiment of CQIP-guided speculative multithreading is related to the use of the compiler 808 in system 800 to select spawning pairs and generate instructions as discussed above.
  • compiler 808 may include a profile analyzer module 820 that, when executed by the processor 804 , analyzes a profile to generate a control flow graph as described above in connection with FIG. 3.
  • the compiler 808 may also include a matrix builder module 824 that, when executed by the processor 804 , computes 220 reaching probabilities and generates a reaching probabilities matrix 335 as discussed above.
  • the compiler 808 may also include a spawning pair selector module 826 that, when executed by the processor 804 , identifies 230 candidate basic blocks and selects 240 one or more spawning pairs.
  • the compiler 808 may include a slicer module 822 that identifies 502 (FIG.
  • the compiler 808 may further include a code generator module 828 that, when executed by the processor 804 , generates 60 an enhanced binary file 350 (FIG. 3).

Abstract

A method for generating instructions to facilitate control-quasi-independent-point multithreading is provided. A spawn point and control-quasi-independent-point are determined. An instruction stream is generated to partition a program so that portions of the program are parallelized by speculative threads. A method of performing control-quasi-independent-point guided speculative multithreading includes spawning a speculative thread when the spawn point is encountered. An embodiment of the method further includes performing speculative precomputation to determine a live-in value for the speculative thread.

Description

    BACKGROUND
  • 1. Technical Field [0001]
  • The present invention relates generally to information processing systems and, more specifically, to spawning of speculative threads for speculative multithreading. [0002]
  • 2. Background Art [0003]
  • In order to increase performance of information processing systems, such as those that include microprocessors, both hardware and software techniques have been employed. One software approach that has been employed to improve processor performance is known as “multithreading.” In multithreading, an instruction stream is split into multiple instruction streams that can be executed in parallel. In software-only multithreading approaches, such as time-multiplex multithreading or switch-on-event multithreading, the multiple instruction streams are alternatively executed on the same shared processor. [0004]
  • Increasingly, multithreading is supported in hardware. For instance, in one approach, processors in a multi-processor system, such as a chip multiprocessor (“CMP”) system, may each act on one of the multiple threads simultaneously. In another approach, referred to as simultaneous multithreading (“SMT”), a single physical processor is made to appear as multiple logical processors to operating systems and user programs. That is, each logical processor maintains a complete set of the architecture state, but nearly all other resources of the physical processor, such as caches, execution units, branch predictors control logic and buses are shared. The threads execute simultaneously and make better use of shared resources than time-multiplex multithreading or switch-on-event multithreading. [0005]
  • For those systems, such as CMP and SMT multithreading systems, that provide hardware support for multiple threads, one or more threads may be idle during execution of a single-threaded application. Utilizing otherwise idle threads to speculatively parallelize the single-threaded application can increase speed of execution, but it is often-times difficult to determine which sections of the single-threaded application should be speculatively executed by the otherwise idle thread. Speculative thread execution of a portion of code is only beneficial if the application's control-flow ultimately reaches that portion of code. In addition, speculative thread execution can be delayed, and rendered less effective, due to latencies associated with data fetching. Embodiments of the method and apparatus disclosed herein address these and other concerns related to speculative multithreading.[0006]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present invention may be understood with reference to the following drawings in which like elements are indicated by like numbers. These drawings are not intended to be limiting but are instead provided to illustrate selected embodiments of a method and apparatus for facilitating control-quasi-independent-points guided speculative multithreading. [0007]
  • FIG. 1 is a flowchart illustrating at least one embodiment of a method for generating instructions for control-quasi-independent-points guided speculative multithreading. [0008]
  • FIG. 2 is a flowchart illustrating at least one embodiment of a method for identifying control-quasi-independent-points for speculative multithreading. [0009]
  • FIG. 3 is a data flow diagram showing at least one embodiment of a method for generating instructions for control-quasi-independent-points guided speculative multi threading. [0010]
  • FIG. 4 is a flowchart illustrating at least one embodiment of a software compilation process. [0011]
  • FIG. 5 is a flowchart illustrating at least one embodiment of a method for generating instructions to precompute speculative-thread's live-in values for control-quasi-independent-points guided speculative multithreading. [0012]
  • FIGS. 6 and 7 are flowcharts illustrating at least one embodiment of a method for performing speculative multithreading using a combination of control-quasi-independent-points guided speculative multithreading and speculative precomputation of live-in values. [0013]
  • FIG. 8 is a block diagram of a processing system capable of performing at least one embodiment of control-quasi-independent-points guided speculative multithreading.[0014]
  • DETAILED DISCUSSION
  • FIG. 1 is a flowchart illustrating at least one embodiment of a method for generating instructions to facilitate control-quasi-independent-points (“CQIP”) guided speculative multithreading. For at least one embodiment of the [0015] method 100, instructions are generated to reduce the execution time in a single-threaded application through the use of one or more simultaneous speculative threads. The method 100 thus facilitates the parallelization of a portion of an application's code through the use of the simultaneous speculative threads. A speculative thread, referred to as the spawnee thread, executes instructions that are ahead of the code being executed by the thread that performed the spawn. The thread that performed the spawn is referred to as the spawner thread. For at least one embodiment, the spawnee thread is an SMT thread that is executed by a second logical processor on the same physical processor as the spawner thread. One skilled in the art will recognize that the method 100 may be utilized in any multithreading approach, including SMT, CMP multithreading or other multiprocessor multithreading, or any other known multithreading approach that may encounter idle thread contexts.
  • Traditional software program parallelization techniques are usually applied to numerical and regular applications. However, traditional automated compiler parallelization techniques do not perform well for irregular or non-numerical applications such as those that require accesses to memory based on linked data structures. Nonetheless, various studies have demonstrated that these irregular and integer applications still have large amounts of thread level parallelism that could be exploited through judicious speculative multithreading. The [0016] method 100 illustrated in FIG. 1 provides a mechanism to partition single-threaded application into sub-tasks that can be speculatively executed using additional threads.
  • In contrast to some types of traditional speculative multithreading techniques, which spawn speculative threads based on known control dependent structures such as calls or loops, the [0017] method 100 of FIG. 1 determines spawn points based on control independency, yet makes provision for handling data flow dependency among parallel threads. The following discussion explains that the method 100 selects thread spawning points based on an analysis of control independence, in an effort to achieve speculative parallelization with minimal misspecualtion in relation to control flow. In addition, the method addresses data flow dependency in that live-in values are supplied. For at least one embodiment, live-in values are predicted using a value prediction approach. In at least one other embodiment, live-in values are pre-computed using speculative precomputation based on backward dependency analysis.
  • FIG. 1 illustrates that a [0018] method 100 for generating instructions to facilitate CQIP-guided multithreading includes identification 10 of spawning pairs that each include a spawn point and a CQIP. At block 50, the method 100 provides for calculation of live-in values for data dependences in the helper thread to be spawned. At block 60, instructions are generated such that, when the instructions are executed by a processor, a speculative thread is spawned and speculatively executes a selected portion of the application's code.
  • FIG. 2 is a flowchart further illustrating at least one embodiment of [0019] identification 10 of control-quasi-independent-points for speculative multithreading. FIG. 2 illustrates that the method 10 performs 210 profile analysis. During the analysis 210, a control flow graph (see, e.g., 330 of FIG. 3) is generated to represent flow of control among the basic blocks associated with the application. The method 10 then computes 220 reaching probabilities. That is, the method 10 computes 220 the probability that a second basic block will be reached during execution of the source program, if a first basic block is executed. Candidate basic blocks are identified 230 as potential spawn pairs based on the reaching probabilities previously computed 220. At block 240, the candidates are evaluated according to selected metrics in order to select one or more spawning pairs. Each of blocks 210 (performing profile analysis), 220 (computing reaching probabilities), 230 (identifying candidate basic blocks), and 240 (selecting spawning pair) are described in further detail below in connection with FIG. 3.
  • FIG. 3 is a data flow diagram. The flow of data is represented in relation to an expanded flowchart that incorporates the actions illustrated in both FIGS. 1 and 2. FIG. 3 illustrates that, for at least one embodiment of the [0020] method 100 illustrated in FIG. 1, certain data is consulted, and certain other data is generated, during execution of the method 100. FIG. 3 illustrates that a profile 325 is accessed to aid in profile analysis 210. Also, a control flow graph 330 (“CFG”) is accessed to aid in computation 220 of reaching probabilities.
  • Brief reference to FIG. 4 illustrates that the [0021] profile 325 is typically generated by one or more compilation passes prior to execution of the method. In FIG. 4, a typical compilation process 400 is represented. The process 400 involves two compiler-performed passes 405, 410 and also involves a test run 407 that is typically initiated by a user, such as a software programmer. During a first pass 405, the compiler (e.g., 808 in FIG. 8) receives as an input the source code 415 for which compilation is desired. The compiler then generates instrumented binary code 420 that corresponds to the source code 415. The instrumented binary code 420 includes, in addition to the binary for the source code 415 instructions, extra binary code that causes, during a run of the instrumented code 420, statistics to be collected and recorded in a profile 325 and a call graph 424. When a user initiates a test run 407 of the instrumented binary code 420, the profile 325 and call graph 424 are generated. During the normal compilation pass 410, the profile 325 is used as an input into the compiler and a binary code file 340 is generated. The profile 325 may be used, for example, by the compiler during the normal compilation pass 410 to aid with performance enhancements such as speculative branch prediction.
  • Each of the [0022] passes 405, 410, and the test run 407, are optional to the method 100 in that any method of generating the information represented by profile 325 may be utilized. Accordingly, first pass 405 and normal pass 410, as well as test run 407, are depicted with broken lines in FIG. 4 to indicate their optional nature. One skilled in the art will recognize that any method of generating the information represented by profile 325 may be utilized, and that the actions 405, 407, 410 depicted in FIG. 4 are provided for illustrative purposes only. One skilled in the art will also recognize that the method 100 described herein may be applied, in an alternative embodiment, to a binary file. That is, the profile 325 may be generated for a binary file rather than a high-level source code file, and the profile analysis 210 (FIG. 2) may be performed using such binary-based profile as an input.
  • Returning to FIG. 3, one can see that the [0023] profile analysis 210 utilizes the profile 325 as an input and generates a control flow graph 330 as an output. The method 100 builds the CFG 330 during the profile analysis 210 such that each node of the CFG 330 represents a basic block of the source program. Edges between nodes of the CFG 330 represent possible control flows among the basic blocks. For at least one embodiment, edges of the CFG 330 are weighted with the frequency that the corresponding control flow has been followed (as reflected in the profile 325). Accordingly, the edges are weighted by the probability that one basic block follows the other, without revisiting the latter node. In contrast to other CFG representations, such as “edge profiling” which represents only intra-procedural edges, at least one embodiment of the CFG 330 created during profile analysis 210 includes representation of inter-procedural edges.
  • For at least one embodiment, the [0024] CFG 330 is pruned to simplify the CFG 330 and control its size. The least frequently executed basic blocks are pruned from the CFG 330. To determine which nodes should remain in the CFG 330, and which should be pruned, the weights of the edges to a block are used to determine the basic block's execution count. The basic blocks are ordered by execution count, and are selected to remain in the CFG 330 according to their execution count. For at least one embodiment, the basic blocks are chosen from highest to lower execution count until a predetermined threshold percentage of the total executed instructions are included in the CFG 330. Accordingly, after weighting and pruning, the most frequently-executed basic blocks are represented in the CFG 330.
  • For at least one embodiment, the predetermined threshold percentage of executed instructions chosen to remain in the [0025] CFG 330 during profile analysis 20 is ninety (90) percent. For selected embodiments, the threshold may be varied to numbers higher or lower than ninety percent, based on factors such as application requirements and/or machine resource availability. For instance, if a relatively large number of hardware thread contexts are supported by the machine resources, then a lower threshold may be chosen in order to facilitate more aggressive speculation.
  • In order to retain control flow information about pruned basic blocks, the following processing may also occur during [0026] profile analysis 210. When a node is pruned from the CFG 330, an edge from a predecessor to the pruned node is transformed to one or more edges from that predecessor to the node's successor(s). Also, an edge from the pruned node to a successor is transformed to one or more edges from the pruned node's predecessor(s) to the successor. If, during this transformation, an edge is transformed into multiple edges, the weight of the original edge is proportionally apportioned across the new edges.
  • FIG. 3 illustrates that the [0027] CFG 330 produced during profile analysis 210 is utilized to compute 220 reaching probabilities. At least one embodiment of reaching probability computation 220 utilizes the profile CFG 330 as an input and generates a reaching probability matrix 335 as an output. As stated above, as used herein the “reaching probability” is the probability that a second basic block will be reached after execution of a first basic block, without revisiting the first basic block. For at least one embodiment, the reaching probabilities computed at block 220 are stored in a two-dimensional square matrix 335 that has as many rows and columns as nodes in the CFG 330. Each element of the matrix represents the probability to execute the basic block represented by the column after execution of the basic block represented by the row.
  • For at least one embodiment, this probability is computed as the sum of the frequencies for all the various sequences of basic blocks that exist from the source node to the destination node. In order to simplify the computation, a constraint is imposed such that the source and destination nodes may only appear once in the sequence of nodes as the first and last nodes, respectively, and may not appear again as intermediate nodes. (For determining the probability of reaching a basic block again after it has been executed, the basic block will appear twice—as both the source and destination nodes). Other basic blocks are permitted to appear more than once in the sequence. [0028]
  • At [0029] block 230, the reaching probability matrix 335 is traversed to evaluate pairs of basic blocks and identify those that are candidates for a spawning pair. As used herein, the term “spawning pair” refers to a pair of instructions associated with the source program. One of the instructions is a spawn point, which is an instruction within a first basic block. For at least one embodiment, the spawn point is the first instruction of the first basic block.
  • The other instruction is a target point and is, more specifically, a control quasi-independent point (“CQIP”). The CQIP is an instruction within a second basic block. For at least one embodiment, the CQIP is the first instruction of the second basic block. A spawn point is the instruction in the source program that, when reached, will activate creation of a speculative thread at the CQIP, where the speculative thread will start its execution. [0030]
  • For each element in the reaching [0031] probability matrix 335, two basic blocks are represented. The first block includes a potential spawn point, and the second block includes a potential CQIP. An instruction (such as the first instruction) of the basic block for the row is the potential spawn point. An instruction (such as the first instruction) of the basic block for the column is the potential CQIP. Each element of the reaching probability matrix 335 is evaluated, and those elements that satisfy certain selection criteria are chosen as candidates for spawning pairs. For at least one embodiment, the elements are evaluated to determine those pairs whose probability is higher than a certain predetermined threshold; that is, the probability to reach the control quasi-independent point after execution of the spawn point is higher than a given threshold. This criterion is designed to minimize spawning of speculative threads that are not executed. For at least one embodiment, a pair of basic blocks associated with an element of the reaching probability matrix 335 is considered as a candidate for a spawning pair if its reaching probability is higher than 0.95
  • A second criterion for selection of a candidate spawning pair is the average number of instructions between the spawn point and the CQIP. Ideally, a minimum average number of instructions should exist between the spawning point and the CQIP in order to reduce the relative overhead of thread creation. If the distance is too small, the overhead of thread creation may outweigh the benefit of run-ahead execution because the speculative thread will not run far enough ahead. For at least one embodiment, a pair of basic blocks associated with an element of the reaching [0032] probability matrix 335 is considered as a candidate for a spawning pair if the average number of instructions between then is greater than 32 instructions.
  • Distance between the basic blocks may be additionally stored in the [0033] matrix 335 and considered in the identification 230 of spawning pair candidates. For at least one embodiment, this additional information may be calculated during profile analysis 210 and included in each element of the reaching probability matrix 335. The average may be calculated as the sum of the number of instructions executed by each sequence of basic blocks, multiplied by their frequency.
  • At [0034] block 240, the spawning pair candidates are evaluated based on analysis of one or more selected metrics. These metrics may be prioritized. Based on the evaluation of the candidate spawning pairs in relation to the prioritized metrics, one or more spawning pairs are selected.
  • The metrics utilized at [0035] block 240 may include the minimum average distance between the basic blocks of the potential spawning pair (described above), as well as an evaluation of mispredicted branches, load misses and/or instruction cache misses. The metrics may also include additional considerations. One such additional consideration is the maximum average distance between the basic blocks of the potential spawning pair. It should be noted that there are also potential performance penalties involved with having the average number of instructions between the spawn point and CQIP be too large. Accordingly, the selection of spawning pairs may also impose a maximum average distance. If the distance between the pair is too large, the speculative thread may incur stalls in a scheme where the speculative thread has limited storage for speculative values. In addition, if the sizes of speculative threads are sufficiently dissimilar, speculative threads may incur stalls in a scheme where the speculative thread cannot commit its states until it becomes the non-speculative thread (see discussion of “join point” in connection with FIGS. 6 and 7, below). Such stalls are likely to result in ineffective holding of critical resources that otherwise would be used by non-speculative threads to make forward progress.
  • Another additional consideration is the number of dependent instructions that the speculative thread includes in relation to the application code between the spawning point and the CQIP. Preferably, the average number of speculative thread instructions dependent on values generated by a previous thread (also referred to as “live-ins”) should be relatively low. A smaller number of dependent instructions allow for more timely computation of the live-in values for the speculative thread. [0036]
  • In addition, for selected embodiments it is preferable that a relatively high number of the live-in values for the speculative thread are value-predictable. For those embodiments that use value prediction to provide for [0037] calculation 50 of live-in values (discussed further below), value-predictability of the live-in values facilitates faster communication of live-in values, thus minimizing overhead of spawning while also allowing correctness and accuracy of speculative thread computation.
  • It is possible that the candidate spawning pairs identified at [0038] block 230 may include several good candidates for CQIP's associated with a given spawn point. That is, for a given row of the reaching probability matrix 335, more than one element may be selected as a candidate spawning pair. In such case, during the metrics evaluation at block 240, the best CQIP for the spawn point is selected because, for a given spawn point, a speculative thread will be spawned at only one CQIP. In order to choose the best CQIP for a given spawn point, the potential CQIP's identified at block 230 are prioritized according to the expected benefit.
  • In at least one alternative embodiment, if there are sufficient hardware thread resources, more than one CQIP can be chosen for a corresponding spawn point. In such case, multiple concurrent, albeit mutually exclusive, speculative threads may be spawned and executed simultaneously to perform “eager” execution of speculative threads. The spawning condition for these multiple CQIPs can be examined and verified, after the speculative threads have been executed, to determine the effectiveness of the speculation. If one of these multiple speculative threads proves to be good speculation, and another bad, then the results of the former can be reused by the main thread while the results of the latter may be discarded. [0039]
  • In addition to those spawning pairs selected according to the metrics evaluation, at least one embodiment of the [0040] method 100 selects 240 CALL return point pairs (pairs of subroutine calls and the return points) if they satisfy the minimum size constraint. These pairs might not otherwise be selected at block 240 because the reaching probability for such pairs is sometimes too low to satisfy the selection criteria discussed above in connection with candidate identification 230. In particular, if a subroutine is called from multiple locations, it will have multiple predecessors and multiple successors in the CFG 330. If all the calls are executed a similar number of times, the reaching probability of any return point pair will be low since the graph 330 will have multiple paths with similar weights.
  • At [0041] block 50, the method 100 provides for calculation of live-in values for the speculative thread to be executed at the CQIP. By “provides for” it is meant that instructions are generated, wherein execution of the generated instructions, possibly in conjunction with some special hardware support, will result in calculation of a predicted live-value to be used as an input by the spawnee thread. Of course, block 50 might determine that no live-in values are necessary. In such case, “providing for” calculation of live-in values simply entails determining that no live-in values are necessary.
  • Predicting thread input values allows the processor to execute speculative threads as if they were independent. At least one embodiment of [0042] block 50 generates instructions to perform or trigger value prediction. Any known manner of value prediction, including hardware value prediction, may be implemented. For example, instructions may be generated 50 such that the register values of the spawned thread are predicted to be the same as those of the spawning thread at spawn time.
  • Another embodiment of the [0043] method 100 identifies, at block 50, a slice of instructions from the application's code that may be used for speculative precomputation of one or more live-in values. While value prediction is a promising approach, it often requires rather complex hardware support. In contrast, no additional hardware support is necessary for speculative precomputation. Speculative precomputation can be performed at the beginning of the speculative thread execution in an otherwise idle thread context, providing the advantage of minimizing misspeculations of live-in values without requiring additional value prediction hardware support. Speculative precomputation is discussed in further detail below in connection with FIG. 5.
  • FIG. 5 illustrates an embodiment of the [0044] method 100 wherein block 50 is further specified to identify 502 precomputation instructions to be used for speculative precomputation of one or more live-in values. For at least one embodiment, a set of instructions, called a slice, is computed at block 502 to include only those instructions identified from the original application code that are necessary to compute the live-in value. The slice therefore is a subset of instructions from the original application code. The slice is computed by following the dependence edges backward from the instruction including the live-in value until all instructions necessary for calculation of the live-in value have been identified. A copy of the identified slice instructions is generated for insertion 60 into an enhanced binary file 350 (FIG. 3).
  • FIGS. 3 and 5 illustrate that the [0045] methods 100, 500 for generating instructions for CQIP-guided multithreading generate an enhanced binary file 350 at block 60. The enhanced binary file 350 includes the binary code 340 for the original single-threaded application, as well as additional instructions. A trigger instruction to cause the speculative thread to be spawned is inserted into the enhanced binary file 350 at the spawning point (s) selected at block 240. The trigger instruction can be a conventional instruction in the existing instruction set of a processor, denoted with special marks. Alternatively, the trigger instruction can be a special instruction such as a fork or spawn instruction. Trigger instructions can be executed by any thread.
  • In addition, the instructions to be performed by the speculative thread are included in the enhanced [0046] binary file 350. These instructions may include instructions added to the original code binary file 340 for live-in calculation, and also some instructions already in the original code binary file 340, beginning at the CQIP, that the speculative thread is to execute. That is, regarding the speculative-thread instructions in the enhanced binary file 350, two groups of instructions may be distinguished for each spawning pair, if the speculative thread is to perform speculative precomputation for live-in values. In contrast, for a speculative thread that is to use utilize value prediction for its live-in values, only the latter group of instructions described immediately below appears in the enhanced binary file 350.
  • The first group of instructions are generated at block [0047] 50 (or 502, see FIG. 5) and are incorporated 60 into the enhanced binary code file 350 in order to provide for the speculative thread's calculation of live-in values. For at least one embodiment, the instructions to be performed by the speculative thread to pre-compute live-in values are appended at the end of the file 350, after those instructions associated with the original code binary file 340.
  • Such instructions do not appear for speculative threads that use value prediction. Instead, specialized value prediction hardware may be used for value prediction. The value prediction hardware is fired by the spawn instruction. When the processor executes a spawn instruction, the hardware initializes the speculative thread registers with the predicted live-in value. [0048]
  • Regardless of whether the speculative thread utilizes value prediction (no additional instructions in the enhanced binary file [0049] 350) or speculative precomputation (slice instructions in the enhanced binary file 350), the speculative thread is associated with the second group of instructions alluded to above. The second set of instructions are instructions that already exist in the original code binary file 340. The subset of such instructions that are associated with the speculative thread are those instructions in the original code binary file 340 starting at the CQIP. For speculative threads that utilize speculative pre-computation for live-ins, the precomputation slice (which may be appended at the end of the enhanced binary file) terminates with a branch to the corresponding CQIP, which causes the speculative thread to begin executing the application code instructions at the CQIP. For speculative threads that utilize value prediction for live-in values, the spawnee thread begins execution of the application code instructions beginning at the CQIP.
  • In an alternative embodiment, the enhanced [0050] binary file 350 includes, for the speculative thread, a copy of the relevant subset of instructions from the original application, rather than providing for the speculative thread to branch to the CQIP instruction of the original code. However, the inventors have found the non-copy approach discussed in the immediate preceding paragraph, which is implemented with appropriate branch instructions, efficiently allows for reduced code size.
  • Accordingly, the foregoing discussion illustrates that, for at least one embodiment, [0051] method 100 is performed by a compiler 808 (FIG. 8). In such embodiment, the method 100 represents an automated process in which a compiler identifies a spawn point and an associated control-quasi-independent point (“CQIP”) target for a speculative thread, generates the instructions to pre-compute its live-ins, and embeds a trigger at the spawn point in the binary. The pre-computation instructions for the speculative thread are incorporated (such as, for example, by appending) into an enhanced binary file 350. One skilled in the art will recognize that, in alternative embodiments, the method 100 may be performed manually such that one or more of 1) identifying CQIP spawning pairs 10, 2) providing for calculation of live-in values 50, and 3) modification of the main thread binary 60 may be performed interactively with human intervention.
  • In sum, a method for identifying spawning pairs and adapting a binary file to perform control-quasi-independent points guided speculative multithreading has been described. An embodiment of the method is performed by a compiler, which identifies proper spawn points and CQIP, provides for calculation of live-in values in speculative threads, and generates an enhanced binary file. [0052]
  • FIGS. 6 and 7 illustrate at least one embodiment of a [0053] method 600 for performing speculative multithreading using a combination of control-quasi-independent-points guided speculative multithreading and speculative precomputation of live-in values. For at least one embodiment, the method 600 is performed by a processor (e.g. 804 of FIG. 8) executing the instructions in an enhanced binary code file (e.g., 350 of FIG. 3). For the method 600 illustrated in FIGS. 6 and 7, it is assumed, that the enhanced binary code file has been generated according to the method illustrated in FIG. 5, such that instructions to perform speculative precomputation of live-in values have been identified 502 and inserted into the enhanced binary file.
  • FIGS. 6 and 7 illustrate that, during execution of the enhanced binary code file, multiple threads T[0054] 0, T1, . . . Tx may be executing simultaneously. The flow of control associated with each of these multiple threads is indicated by the notations T0, T1, and Tx on the edges between the blocks illustrated in FIGS. 6 and 7. One skilled in the art will recognize that the multiple threads may be spawned from a non-speculative thread. Also, in at least one embodiment, a speculative thread may spawn one or more additional non-speculative successor threads.
  • FIG. 6 illustrates that processing begins at [0055] 601, where the thread T0 begins execution. At block 602, a check is made to determine whether the thread T0 previously encountered a join point while it (T0) was still speculative. Block 602 is discussed in further detail below. One skilled in the art will understand that block 602 will, of course, evaluate to “false” if the thread T0 was never previously speculative.
  • If [0056] block 602 evaluates to “false”, then an instruction for the thread T0 is executed at block 604. If a trigger instruction associated with a spawn point is encountered 606, then processing continues to block 608. Otherwise, the thread T0 continues execution at block 607. At block 607, it is determined whether a join point has been encountered in the thread T0. If neither a trigger instruction nor join point is encountered, then the thread T0 continues to execute instructions 604 until it reaches 603 the end of its instructions.
  • If a trigger instruction is detected at [0057] block 606, then a speculative thread T1 is spawned in a free thread context at block 608. If slice instructions are encountered by the speculative thread T1 at block 610, the processing continues at block 612. If not, then processing continues at 702 (FIG. 7).
  • At [0058] block 612, slice instructions for speculative precomputation are iteratively executed until the speculative precomputation of the live-in value is complete 614. In the meantime, after spawning the speculative thread T1 at block 608, the spawner thread T0 continues to execute 604 its instructions. FIG. 6 illustrates that, while the speculative thread T1 executes 612 the slice instructions, the spawner thread continues execution 604 of its instructions until another spawn point is encountered 606, a join point is encountered 607, or the instruction stream ends 603. Accordingly, the spawner thread T0 and the spawnee thread T1 execute in parallel during speculative precomputation.
  • When live-in computation is determined complete [0059] 614, or if no slice instructions for speculative precomputation are available to the speculative thread T 1 610, then processing continues at A in FIG. 7.
  • FIG. 7 illustrates that, at [0060] block 702, the speculative thread T1 executes instructions from the original code. At the first iteration of block 702, the CQIP instruction is executed. The execution 702 of spawnee thread instructions is performed in parallel with the execution of the spawner thread code until a terminating condition is reached.
  • At [0061] block 708, the speculative thread T1 checks for a terminating condition. The check 708 evaluates to “true” when the spawnee thread T1 has encountered a CQIP of an active, more speculative thread or has encountered the end of the program. As long as neither condition is true, the spawnee thread T1 proceeds to block 710.
  • If the speculative thread T[0062] 1 determines 708 that a join point has been reached, then it is theoretically ready to perform processing to switch thread contexts with the more speculative thread (as discussed below in connection with block 720). However, at least one embodiment of the method 600 limits such processing to non-speculative threads. Accordingly, when speculative thread T1 determines 708 that it has reached the joint point of a more speculative, active thread, T1 waits 706 to continue processing until it (T1) becomes non-speculative.
  • At [0063] block 710, the speculative thread T1 determines whether a spawning point has been reached. If the 710 condition evaluates to “false”, then T1 continues execution 702 of its instructions.
  • If a spawn point is encountered at [0064] block 710, then thread T1 creates 712 a new speculative thread T1. Thread T1 then continues execution 702 of its instructions, while new speculative thread T1 proceeds to continue speculative thread operation at block 610, as described above in connection with speculative thread T1. One skilled in the art will recognize that, while multiple speculative threads are active, each thread follows the logic described above in connection with T1 (blocks 610 through 614 and blocks 702 through 710 of FIGS. 6 and 7).
  • When the spawner thread T[0065] 0 reaches a CQIP of an active, more speculative thread, then we say that a join point has been encountered. The join point of a thread is the control quasi-independent point at which an on-going speculative thread began execution. It should be understood that multiple speculative threads may be active at one time. Hence the terminology “more speculative.” A “more speculative” thread is a thread that is a spawnee of the reference thread (in this case, thread T0) and includes any subsequently-spawned speculative thread in the spawnee's spawning chain.
  • Thus, the join point check [0066] 607 (FIG. 6) evaluates to true when the thread T0 reaches the CQIP at which any on-going speculative thread began execution. One skilled in the art will recognize that, if multiple speculative threads are simultaneously active, then any one of the multiple CQIP's for the active speculative threads could be reached at block 607. For simplicity of illustration, FIG. 7 assumes that when T0 hits a join point at bock 607, the join point is associated with T1, the next thread in program order, which is the speculative thread whose CQIP has been reached by the non-speculative thread T0.
  • Upon reaching the join point at block [0067] 607 (FIG. 6), a thread T0 proceeds to block 703. The thread T0 determines 703 if it is the non speculative active thread and, if not, waits until it becomes the non-speculative thread.
  • When T[0068] 0 becomes non-speculative, it initiates 704 a verification of the speculation performed by the spawnee thread T1. For at least one embodiment, verification 704 includes determining whether the speculative live-in values utilized by the spawnee thread T1 reflect the actual values computed by the spawner thread.
  • If the [0069] verification 704 fails, then T1 and any other thread more speculative than T1 are squashed 730. Thread T0 then proceeds to C (FIG. 6) to continue execution of its instructions. Otherwise, if the verification 704 succeeds, then thread T0 and thread T1 proceed to block 720. At block 720, the thread context where the thread T0 has been executing becomes free and is relinquished. Also, the speculative thread T1 that started at the CQIP becomes the non-speculative thread and continues execution at C (FIG. 6).
  • Reference to FIG. 6 illustrates that the newly non-speculative thread T[0070] 0 checks at block 602 to determine whether it encountered a CQIP at block 708 (FIG. 6) while it was still speculative. If so, then the thread T0 proceeds to B in order to begin join point processing as described above.
  • The combination of both CQIP-based spawning point selection and speculative computation of live-in values illustrated in FIGS. 5, 6 and [0071] 7 provide a multithreading method that helps improve the efficacy and accuracy of speculative multithreading. Such improvements are achieved because data dependencies among speculative threads are minimized since the values of live-ins are computed before execution of the speculative thread.
  • In the preceding description, various aspects of a method and apparatus for facilitating control-quasi-independent-points guided speculative multithreading have been described. For purposes of explanation, specific numbers, examples, systems and configurations were set forth in order to provide a more thorough understanding. However, it is apparent to one skilled in the art that the described method may be practiced without the specific details. In other instances, well-known features were omitted or simplified in order not to obscure the method. [0072]
  • Embodiments of the method may be implemented in hardware, software, firmware, or a combination of such implementation approaches. Embodiments of the invention may be implemented as computer programs executing on programmable systems comprising at least one processor, a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Program code may be applied to input data to perform the functions described herein and generate output information. The output information may be applied to one or more output devices, in known fashion. For purposes of this application, a processing system includes any system that has a processor, such as, for example; a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor. [0073]
  • The programs may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The programs may also be implemented in assembly or machine language, if desired. In fact, the method described herein is not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language [0074]
  • The programs may be stored on a storage media or device (e.g., hard disk drive, floppy disk drive, read only memory (ROM), CD-ROM device, flash memory device, digital versatile disk (DVD), or other storage device) readable by a general or special purpose programmable processing system. The instructions, accessible to a processor in a processing system, provide for configuring and operating the processing system when the storage media or device is read by the processing system to perform the procedures described herein. Embodiments of the invention may also be considered to be implemented as a machine-readable storage medium, configured for use with a processing system, where the storage medium so configured causes the processing system to operate in a specific and predefined manner to perform the functions described herein. [0075]
  • An example of one such type of processing system is shown in FIG. 8. [0076] System 800 may be used, for example, to execute the processing for a method of performing control-quasi-independent-points guided speculative multithreading, such as the embodiments described herein. System 800 may also execute enhanced binary files generated in accordance with at least one embodiment of the methods described herein. System 800 is representative of processing systems based on the Pentium®, Pentium® Pro, Pentium® II, Pentium® III, Pentium® 4, and Itanium® and Itanium® II microprocessors available from Intel Corporation, although other systems (including personal computers (PCs) having other microprocessors, engineering workstations, set-top boxes and the like) may also be used. In one embodiment, sample system 800 may be executing a version of the Windows™ operating system available from Microsoft Corporation, although other operating systems and graphical user interfaces, for example, may also be used.
  • Referring to FIG. 8, [0077] processing system 800 includes a memory system 802 and a processor 804. Memory system 802 may store instructions 810 and data 812 for controlling the operation of the processor 804. For example, instructions 810 may include a compiler program 808 that, when executed, causes the processor 804 to compile a program 415 (FIG. 4) that resides in the memory system 802. Memory 802 holds the program to be compiled, intermediate forms of the program, and a resulting compiled program. For at least one embodiment, the compiler program 808 includes instructions to select spawning pairs and generate instructions to implement CQIP-guided multithreading. For such embodiment, instructions 810 may also include an enhanced binary file 350 (FIG. 3) generated in accordance with at least one embodiment of the present invention.
  • [0078] Memory system 802 is intended as a generalized representation of memory and may include a variety of forms of memory, such as a hard drive, CD-ROM, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM) and related circuitry. Memory system 802 may store instructions 810 and/or data 812 represented by data signals that may be executed by processor 804. The instructions 810 and/or data 812 may include code for performing any or all of the techniques discussed herein. At least one embodiment of CQIP-guided speculative multithreading is related to the use of the compiler 808 in system 800 to select spawning pairs and generate instructions as discussed above.
  • Specifically, FIG. 8 illustrates that [0079] compiler 808 may include a profile analyzer module 820 that, when executed by the processor 804, analyzes a profile to generate a control flow graph as described above in connection with FIG. 3. The compiler 808 may also include a matrix builder module 824 that, when executed by the processor 804, computes 220 reaching probabilities and generates a reaching probabilities matrix 335 as discussed above. The compiler 808 may also include a spawning pair selector module 826 that, when executed by the processor 804, identifies 230 candidate basic blocks and selects 240 one or more spawning pairs. Also, the compiler 808 may include a slicer module 822 that identifies 502 (FIG. 5) instructions for a slice to be executed by a speculative thread in order to perform speculative precomputation of live-in values. The compiler 808 may further include a code generator module 828 that, when executed by the processor 804, generates 60 an enhanced binary file 350 (FIG. 3).
  • While particular embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that changes and modifications can be made without departing from the present invention in its broader aspects. The appended claims are to encompass within their scope all such changes and modifications that fall within the true scope of the present invention. [0080]

Claims (48)

What is claimed is:
1. A method of compiling a software program, comprising:
selecting a spawning pair that includes a spawn point and a control-quasi-independent point (CQIP);
providing for calculation of a live-in value for a speculative thread; and
generating an enhanced binary file that includes instructions, the instructions including a trigger instruction to cause spawning of the speculative thread at the CQIP.
2. The method of claim 1, further comprising:
performing profile analysis.
3. The method of claim 1, further comprising:
computing a plurality of reaching probabilities.
4. The method of claim 1, further comprising:
identifying a plurality of candidate basic blocks.
5. The method of claim 4, wherein:
selecting a spawning pair further comprises selecting the spawning pair from the plurality of candidate basic blocks.
6. The method of claim 1, wherein:
generating the enhanced binary file further comprises embedding a trigger at a spawn point associated with the spawning pair.
7. The method of claim 1, wherein selecting the spawning pair further comprises:
selecting a spawning pair having at least a minimum average number of instructions between the spawn point and the CQIP of the spawning pair.
8. The method of claim 3, wherein selecting the spawning pair further comprises:
selecting a spawning pair having at least a minimum reaching probability.
9. The method of claim 1, wherein providing for calculation of the live-in value further comprises:
providing an instruction to invoke hardware prediction of the live-in value.
10. The method of claim 1, wherein providing for calculation of the live-in value further comprises:
generating one or more instructions to perform speculative precomputation of the live-in values.
11. The method of claim 1, wherein:
selecting a spawning pair further comprises selecting a first spawning pair and a second spawning pair; and
generating an enhanced binary file that includes instructions further comprises generating an enhanced binary file that includes a trigger instruction for each spawning pair.
12. An article comprising:
a machine-readable storage medium having a plurality of machine accessible instructions;
wherein, when the instructions are executed by a processor, the instructions provide for
selecting a spawning pair that includes a spawn point and a control-quasi-independent point (CQIP);
providing for calculation of a live-in value for a speculative thread; and
generating an enhanced binary file that includes instructions, the instructions including a trigger instruction to cause spawning of a speculative thread at the control-quasi-independent.
13. The article of claim 12, wherein the instructions further comprise:
instructions that provide for performing profile analysis.
14. The article of claim 12, wherein the instructions further comprise:
instructions that provide for computing a plurality of reaching probabilities.
15. The article of claim 12, wherein the instruction further comprise:
instructions that provide for identifying a plurality of candidate basic blocks.
16. The article of claim 15, wherein:
the instructions that provide for selecting a spawning pair further comprise instructions that provide for selecting the spawning pair from the plurality of candidate basic blocks.
17. The article of claim 12, wherein:
the instructions that provide for generating the enhanced binary file further comprise instructions that provide for embedding a trigger at a spawn point associated with the spawning pair.
18. The article of claim 12, wherein the instructions that provide for selecting the spawning pair further comprise:
instructions that provide for selecting a spawning pair having at least a minimum average number of instructions between the spawn point and the CQIP of the spawning pair.
19. The article of claim 14, wherein the instructions that provide for selecting the spawning pair further comprise:
instructions that provide for selecting a spawning pair having at least a minimum reaching probability.
20. The article of claim 12, wherein the instructions that provide for providing for calculation of the live-in value further comprise:
instructions that provide for providing an instruction to invoke hardware prediction of the live-in value.
21. The article of claim 12, wherein instructions that provide for providing for calculation of the live-in value further comprise:
instructions that provide for generating one or more instructions to perform speculative precomputation of the live-in values.
22. A method, comprising:
executing one or more instructions in a first instruction stream in a non-speculative thread;
spawning a speculative thread at a spawn point in the first instruction stream, wherein the computed probability of reaching a control quasi-independent point during execution of the first instruction stream, after execution of the spawn point, is higher than a predetermined threshold; and
simultaneously:
executing in the speculative thread a speculative thread instruction stream that includes a subset of the instructions in the first instruction stream, the speculative thread instruction stream including the control-quasi-independent point; and
executing one or more instructions in the first instruction stream following the spawn point.
23. The method of claim 22, wherein:
executing one or more instructions in the first instruction stream following the spawn point further comprises executing instructions until the CQIP is reached.
24. The method of claim 23, further comprising:
determining, responsive to reaching the CQIP, whether speculative execution performed in the speculative thread is correct.
25. The method of claim 24, further comprising:
responsive to determining the speculative execution performed in the speculative thread is correct, relinquishing the non-speculative thread.
26. The method of claim 24, further comprising:
responsive to determining that the speculative execution performed in the speculative thread is not correct, squashing the speculative thread.
27. The method of claim 26, further comprising:
responsive to determining that the speculative execution performed in the speculative thread is not correct, squashing all active successor threads, if any, of the speculative thread.
28. The method of claim 22, wherein:
the speculative thread instruction stream includes a precomputation slice for the speculative computation of a live-in value.
29. The method of claim 22, wherein:
spawning the speculative thread triggers hardware prediction of a live-in value.
30. The method of claim 28, wherein:
the speculative thread instruction stream includes, after the precomputation slice, a branch instruction to the CQIP.
31. The method of claim 22, further comprising:
spawning a second speculative thread at a spawn point in the speculative thread instruction stream.
32. An article comprising:
a machine-readable storage medium having a plurality of machine accessible instructions;
wherein, when the instructions are executed by a processor, the instructions provide for
executing one or more instructions in a first instruction stream in a non-speculative thread;
spawning a speculative thread at a spawn point in the first instruction stream, wherein the computed probability of reaching a control quasi-independent point during execution of the first instruction stream, after execution of the spawn point, is higher than a predetermined threshold; and
simultaneously:
executing in the speculative thread a speculative thread instruction stream that includes a subset of the instructions in the first instruction stream, the speculative thread instruction stream including the control-quasi-independent point; and
executing one or more instructions in the first instruction stream following the spawn point.
33. The article of claim 32, wherein:
the instructions that provide for executing one or more instructions in the first instruction stream following the spawn point further comprise instructions that provide for executing instructions until the CQIP is reached.
34. The article of claim 33, wherein the instructions further comprise:
instructions that provide for determining, responsive to reaching the CQIP, whether speculative execution performed in the speculative thread is correct.
35. The article of claim 34, wherein the instructions further comprise:
instructions that provide for, responsive to determining that the speculative execution performed in the speculative thread is correct, relinquishing the non-speculative thread.
36. The article of claim 34, further comprising:
instructions that provide for, responsive to determining that the speculative execution performed in the speculative thread is not correct, squashing the speculative thread.
37. The article of claim 36, wherein the instructions further comprise:
instructions that provide for, responsive to determining that the speculative execution performed in the speculative thread is not correct, squashing all active successor threads, if any, of the speculative thread.
38. The article of claim 32, wherein:
the speculative thread instruction stream includes a precomputation slice for the speculative computation of a live-in value.
39. The article of claim 32, wherein:
the instruction that provides for spawning the speculative thread triggers hardware prediction of a live-in value.
40. The article of claim 38, wherein:
the speculative thread instruction stream includes, after the precomputation slice, a branch instruction to the CQIP.
41. A compiler comprising:
a spawning pair selector module to select a spawning pair that includes a control-quasi-independent point (“CQIP”) and a spawn point; and
a code generator to generate an enhanced binary file that includes a trigger instruction at the spawn point.
42. The compiler of claim 41, wherein:
the trigger instruction is to spawn a speculative thread to begin execution at the CQIP.
43. The compiler of claim 41, further comprising:
a slicer to generate a slice for precomputation of a live-in value;
wherein the code generator is further to include the precomputation slice in the enhanced binary file.
44. The compiler of claim 41, wherein:
the spawning pair selector module is further to select the spawning pair such that a computed probability of reaching the control-quasi-independent point after execution of the spawn point is higher than a predetermined threshold.
45. The compiler of claim 44, further comprising:
a matrix builder to compute the reaching probability for the spawning pair.
46. The compiler of claim 41, further comprising:
a profile analyzer to build a control flow graph.
47. The compiler of claim 41, wherein:
the trigger instruction is to trigger hardware value prediction for a live-in value.
48. The compiler of claim 41, further comprising:
a matrix builder to compute the reaching probability for the spawning pair.
US10/356,435 2003-01-31 2003-01-31 Control-quasi-independent-points guided speculative multithreading Abandoned US20040154010A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US10/356,435 US20040154010A1 (en) 2003-01-31 2003-01-31 Control-quasi-independent-points guided speculative multithreading
US10/423,633 US7814469B2 (en) 2003-01-31 2003-04-24 Speculative multi-threading for instruction prefetch and/or trace pre-build
US10/422,528 US7523465B2 (en) 2003-01-31 2003-04-24 Methods and apparatus for generating speculative helper thread spawn-target points
US10/633,012 US7657880B2 (en) 2003-01-31 2003-08-01 Safe store for speculative helper threads
CNB2003101215924A CN1302384C (en) 2003-01-31 2003-12-29 Recckoning multiroute operation quided by controlling quasi-independent point
US12/879,898 US8719806B2 (en) 2003-01-31 2010-09-10 Speculative multi-threading for instruction prefetch and/or trace pre-build

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/356,435 US20040154010A1 (en) 2003-01-31 2003-01-31 Control-quasi-independent-points guided speculative multithreading

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US10/423,633 Continuation-In-Part US7814469B2 (en) 2003-01-31 2003-04-24 Speculative multi-threading for instruction prefetch and/or trace pre-build
US10/422,528 Continuation-In-Part US7523465B2 (en) 2003-01-31 2003-04-24 Methods and apparatus for generating speculative helper thread spawn-target points

Publications (1)

Publication Number Publication Date
US20040154010A1 true US20040154010A1 (en) 2004-08-05

Family

ID=32770808

Family Applications (4)

Application Number Title Priority Date Filing Date
US10/356,435 Abandoned US20040154010A1 (en) 2003-01-31 2003-01-31 Control-quasi-independent-points guided speculative multithreading
US10/423,633 Expired - Fee Related US7814469B2 (en) 2003-01-31 2003-04-24 Speculative multi-threading for instruction prefetch and/or trace pre-build
US10/422,528 Expired - Fee Related US7523465B2 (en) 2003-01-31 2003-04-24 Methods and apparatus for generating speculative helper thread spawn-target points
US12/879,898 Expired - Lifetime US8719806B2 (en) 2003-01-31 2010-09-10 Speculative multi-threading for instruction prefetch and/or trace pre-build

Family Applications After (3)

Application Number Title Priority Date Filing Date
US10/423,633 Expired - Fee Related US7814469B2 (en) 2003-01-31 2003-04-24 Speculative multi-threading for instruction prefetch and/or trace pre-build
US10/422,528 Expired - Fee Related US7523465B2 (en) 2003-01-31 2003-04-24 Methods and apparatus for generating speculative helper thread spawn-target points
US12/879,898 Expired - Lifetime US8719806B2 (en) 2003-01-31 2010-09-10 Speculative multi-threading for instruction prefetch and/or trace pre-build

Country Status (2)

Country Link
US (4) US20040154010A1 (en)
CN (1) CN1302384C (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050182602A1 (en) * 2004-02-17 2005-08-18 Intel Corporation Computation of all-pairs reaching probabilities in software systems
US20060047495A1 (en) * 2004-09-01 2006-03-02 Jesus Sanchez Analyzer for spawning pairs in speculative multithreaded processor
US20060212689A1 (en) * 2005-03-18 2006-09-21 Shailender Chaudhry Method and apparatus for simultaneous speculative threading
WO2006122990A2 (en) * 2005-05-19 2006-11-23 Intel Corporation Storage-deployment apparatus, system and method for multiple sets of speculative-type instructions
US20070011684A1 (en) * 2005-06-27 2007-01-11 Du Zhao H Mechanism to optimize speculative parallel threading
US20080244223A1 (en) * 2007-03-31 2008-10-02 Carlos Garcia Quinones Branch pruning in architectures with speculation support
US20090083488A1 (en) * 2006-05-30 2009-03-26 Carlos Madriles Gimeno Enabling Speculative State Information in a Cache Coherency Protocol
CN101826014A (en) * 2010-04-20 2010-09-08 北京邮电大学 Dividing method of source code in software engineering
US20110119660A1 (en) * 2008-07-31 2011-05-19 Panasonic Corporation Program conversion apparatus and program conversion method
US20120204065A1 (en) * 2011-02-03 2012-08-09 International Business Machines Corporation Method for guaranteeing program correctness using fine-grained hardware speculative execution
US20140317629A1 (en) * 2013-04-23 2014-10-23 Ab Initio Technology Llc Controlling tasks performed by a computing system
US8904118B2 (en) 2011-01-07 2014-12-02 International Business Machines Corporation Mechanisms for efficient intra-die/intra-chip collective messaging
US9135015B1 (en) 2014-12-25 2015-09-15 Centipede Semi Ltd. Run-time code parallelization with monitoring of repetitive instruction sequences during branch mis-prediction
US9208066B1 (en) 2015-03-04 2015-12-08 Centipede Semi Ltd. Run-time code parallelization with approximate monitoring of instruction sequences
US9286090B2 (en) * 2014-01-20 2016-03-15 Sony Corporation Method and system for compiler identification of code for parallel execution
US9286067B2 (en) 2011-01-10 2016-03-15 International Business Machines Corporation Method and apparatus for a hierarchical synchronization barrier in a multi-node system
US9348595B1 (en) 2014-12-22 2016-05-24 Centipede Semi Ltd. Run-time code parallelization with continuous monitoring of repetitive instruction sequences
US9715390B2 (en) 2015-04-19 2017-07-25 Centipede Semi Ltd. Run-time parallelization of code execution based on an approximate register-access specification
US10296346B2 (en) 2015-03-31 2019-05-21 Centipede Semi Ltd. Parallelized execution of instruction sequences based on pre-monitoring
US10296350B2 (en) 2015-03-31 2019-05-21 Centipede Semi Ltd. Parallelized execution of instruction sequences
US10379863B2 (en) * 2017-09-21 2019-08-13 Qualcomm Incorporated Slice construction for pre-executing data dependent loads
US10606727B2 (en) 2016-09-06 2020-03-31 Soroco Private Limited Techniques for generating a graphical user interface to display documentation for computer programs
US11755484B2 (en) * 2015-06-26 2023-09-12 Microsoft Technology Licensing, Llc Instruction block allocation

Families Citing this family (138)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002326378A1 (en) * 2001-07-13 2003-01-29 Sun Microsystems, Inc. Facilitating efficient join operations between a head thread and a speculative thread
US7493607B2 (en) 2002-07-09 2009-02-17 Bluerisc Inc. Statically speculative compilation and execution
JP3862652B2 (en) * 2002-12-10 2006-12-27 キヤノン株式会社 Printing control method and information processing apparatus
US7647585B2 (en) * 2003-04-28 2010-01-12 Intel Corporation Methods and apparatus to detect patterns in programs
US7774759B2 (en) 2003-04-28 2010-08-10 Intel Corporation Methods and apparatus to detect a macroscopic transaction boundary in a program
US20040243767A1 (en) * 2003-06-02 2004-12-02 Cierniak Michal J. Method and apparatus for prefetching based upon type identifier tags
US8266379B2 (en) * 2003-06-02 2012-09-11 Infineon Technologies Ag Multithreaded processor with multiple caches
US7844801B2 (en) * 2003-07-31 2010-11-30 Intel Corporation Method and apparatus for affinity-guided speculative helper threads in chip multiprocessors
JP4042972B2 (en) * 2003-09-30 2008-02-06 インターナショナル・ビジネス・マシーンズ・コーポレーション Optimized compiler, compiler program, and recording medium
US20050071438A1 (en) * 2003-09-30 2005-03-31 Shih-Wei Liao Methods and apparatuses for compiler-creating helper threads for multi-threading
US20050114850A1 (en) * 2003-10-29 2005-05-26 Saurabh Chheda Energy-focused re-compilation of executables and hardware mechanisms based on compiler-architecture interaction and compiler-inserted control
US7996671B2 (en) 2003-11-17 2011-08-09 Bluerisc Inc. Security of program executables and microprocessors based on compiler-architecture interaction
US7206795B2 (en) * 2003-12-22 2007-04-17 Jean-Pierre Bono Prefetching and multithreading for improved file read performance
US7756968B1 (en) 2003-12-30 2010-07-13 Sap Ag Method and system for employing a hierarchical monitor tree for monitoring system resources in a data processing environment
US7941521B1 (en) 2003-12-30 2011-05-10 Sap Ag Multi-service management architecture employed within a clustered node configuration
US7725572B1 (en) 2003-12-30 2010-05-25 Sap Ag Notification architecture and method employed within a clustered node configuration
US7822826B1 (en) 2003-12-30 2010-10-26 Sap Ag Deployment of a web service
US8607209B2 (en) 2004-02-04 2013-12-10 Bluerisc Inc. Energy-focused compiler-assisted branch prediction
US7721266B2 (en) * 2004-03-26 2010-05-18 Sap Ag Unified logging service with a logging formatter
US20050216585A1 (en) * 2004-03-26 2005-09-29 Tsvetelina Todorova Monitor viewer for an enterprise network monitoring system
US7526550B2 (en) * 2004-03-26 2009-04-28 Sap Ag Unified logging service with a log viewer
US7168070B2 (en) * 2004-05-25 2007-01-23 International Business Machines Corporation Aggregate bandwidth through management using insertion of reset instructions for cache-to-cache data transfer
US7434004B1 (en) * 2004-06-17 2008-10-07 Sun Microsystems, Inc. Prefetch prediction
US7200734B2 (en) * 2004-07-31 2007-04-03 Hewlett-Packard Development Company, L.P. Operating-system-transparent distributed memory
US7669194B2 (en) * 2004-08-26 2010-02-23 International Business Machines Corporation Fine-grained software-directed data prefetching using integrated high-level and low-level code analysis optimizations
US8719819B2 (en) 2005-06-30 2014-05-06 Intel Corporation Mechanism for instruction set based thread execution on a plurality of instruction sequencers
WO2006069494A1 (en) * 2004-12-31 2006-07-06 Intel Corporation Parallelization of bayesian network structure learning
US20060157115A1 (en) * 2005-01-11 2006-07-20 Andrew Dorogi Regulator with belleville springs
US7849453B2 (en) * 2005-03-16 2010-12-07 Oracle America, Inc. Method and apparatus for software scouting regions of a program
US7950012B2 (en) * 2005-03-16 2011-05-24 Oracle America, Inc. Facilitating communication and synchronization between main and scout threads
US7472256B1 (en) 2005-04-12 2008-12-30 Sun Microsystems, Inc. Software value prediction using pendency records of predicted prefetch values
US7810075B2 (en) * 2005-04-29 2010-10-05 Sap Ag Common trace files
US20070094213A1 (en) * 2005-07-14 2007-04-26 Chunrong Lai Data partitioning and critical section reduction for Bayesian network structure learning
US20070094214A1 (en) * 2005-07-15 2007-04-26 Li Eric Q Parallelization of bayesian network structure learning
US8037285B1 (en) 2005-09-28 2011-10-11 Oracle America, Inc. Trace unit
US8024522B1 (en) 2005-09-28 2011-09-20 Oracle America, Inc. Memory ordering queue/versioning cache circuit
US7987342B1 (en) 2005-09-28 2011-07-26 Oracle America, Inc. Trace unit with a decoder, a basic-block cache, a multi-block cache, and sequencer
US7966479B1 (en) 2005-09-28 2011-06-21 Oracle America, Inc. Concurrent vs. low power branch prediction
US7870369B1 (en) 2005-09-28 2011-01-11 Oracle America, Inc. Abort prioritization in a trace-based processor
US8032710B1 (en) 2005-09-28 2011-10-04 Oracle America, Inc. System and method for ensuring coherency in trace execution
US7937564B1 (en) 2005-09-28 2011-05-03 Oracle America, Inc. Emit vector optimization of a trace
US7676634B1 (en) 2005-09-28 2010-03-09 Sun Microsystems, Inc. Selective trace cache invalidation for self-modifying code via memory aging
US8015359B1 (en) 2005-09-28 2011-09-06 Oracle America, Inc. Method and system for utilizing a common structure for trace verification and maintaining coherency in an instruction processing circuit
US8499293B1 (en) 2005-09-28 2013-07-30 Oracle America, Inc. Symbolic renaming optimization of a trace
US7877630B1 (en) * 2005-09-28 2011-01-25 Oracle America, Inc. Trace based rollback of a speculatively updated cache
US7949854B1 (en) 2005-09-28 2011-05-24 Oracle America, Inc. Trace unit with a trace builder
US8051247B1 (en) 2005-09-28 2011-11-01 Oracle America, Inc. Trace based deallocation of entries in a versioning cache circuit
US7953961B1 (en) 2005-09-28 2011-05-31 Oracle America, Inc. Trace unit with an op path from a decoder (bypass mode) and from a basic-block builder
US8370576B1 (en) 2005-09-28 2013-02-05 Oracle America, Inc. Cache rollback acceleration via a bank based versioning cache ciruit
US8019944B1 (en) 2005-09-28 2011-09-13 Oracle America, Inc. Checking for a memory ordering violation after a speculative cache write
US20070113056A1 (en) * 2005-11-15 2007-05-17 Dale Jason N Apparatus and method for using multiple thread contexts to improve single thread performance
US20070113055A1 (en) * 2005-11-15 2007-05-17 Dale Jason N Apparatus and method for improving single thread performance through speculative processing
US7739662B2 (en) * 2005-12-30 2010-06-15 Intel Corporation Methods and apparatus to analyze processor systems
US7730263B2 (en) * 2006-01-20 2010-06-01 Cornell Research Foundation, Inc. Future execution prefetching technique and architecture
US20070234014A1 (en) * 2006-03-28 2007-10-04 Ryotaro Kobayashi Processor apparatus for executing instructions with local slack prediction of instructions and processing method therefor
US20080016325A1 (en) * 2006-07-12 2008-01-17 Laudon James P Using windowed register file to checkpoint register state
US8010745B1 (en) 2006-09-27 2011-08-30 Oracle America, Inc. Rolling back a speculative update of a non-modifiable cache line
US8370609B1 (en) 2006-09-27 2013-02-05 Oracle America, Inc. Data cache rollbacks for failed speculative traces with memory operations
US20080126766A1 (en) 2006-11-03 2008-05-29 Saurabh Chheda Securing microprocessors against information leakage and physical tampering
US20080141268A1 (en) * 2006-12-12 2008-06-12 Tirumalai Partha P Utility function execution using scout threads
US7765242B2 (en) * 2007-05-10 2010-07-27 Hewlett-Packard Development Company, L.P. Methods and apparatus for structure layout optimization for multi-threaded programs
US8321840B2 (en) * 2007-12-27 2012-11-27 Intel Corporation Software flow tracking using multiple threads
US8706979B2 (en) * 2007-12-30 2014-04-22 Intel Corporation Code reuse and locality hinting
US8316218B2 (en) * 2008-02-01 2012-11-20 International Business Machines Corporation Look-ahead wake-and-go engine with speculative execution
US8341635B2 (en) * 2008-02-01 2012-12-25 International Business Machines Corporation Hardware wake-and-go mechanism with look-ahead polling
US8775778B2 (en) * 2008-02-01 2014-07-08 International Business Machines Corporation Use of a helper thread to asynchronously compute incoming data
US8725992B2 (en) 2008-02-01 2014-05-13 International Business Machines Corporation Programming language exposing idiom calls to a programming idiom accelerator
US8732683B2 (en) 2008-02-01 2014-05-20 International Business Machines Corporation Compiler providing idiom to idiom accelerator
US8386822B2 (en) 2008-02-01 2013-02-26 International Business Machines Corporation Wake-and-go mechanism with data monitoring
US8145849B2 (en) 2008-02-01 2012-03-27 International Business Machines Corporation Wake-and-go mechanism with system bus response
US8707016B2 (en) * 2008-02-01 2014-04-22 International Business Machines Corporation Thread partitioning in a multi-core environment
US8601241B2 (en) * 2008-02-01 2013-12-03 International Business Machines Corporation General purpose register cloning
US8516484B2 (en) * 2008-02-01 2013-08-20 International Business Machines Corporation Wake-and-go mechanism for a data processing system
US8312458B2 (en) * 2008-02-01 2012-11-13 International Business Machines Corporation Central repository for wake-and-go mechanism
US8880853B2 (en) * 2008-02-01 2014-11-04 International Business Machines Corporation CAM-based wake-and-go snooping engine for waking a thread put to sleep for spinning on a target address lock
US8788795B2 (en) 2008-02-01 2014-07-22 International Business Machines Corporation Programming idiom accelerator to examine pre-fetched instruction streams for multiple processors
US8359589B2 (en) * 2008-02-01 2013-01-22 International Business Machines Corporation Helper thread for pre-fetching data
US8250396B2 (en) 2008-02-01 2012-08-21 International Business Machines Corporation Hardware wake-and-go mechanism for a data processing system
US8612977B2 (en) 2008-02-01 2013-12-17 International Business Machines Corporation Wake-and-go mechanism with software save of thread state
US8452947B2 (en) 2008-02-01 2013-05-28 International Business Machines Corporation Hardware wake-and-go mechanism and content addressable memory with instruction pre-fetch look-ahead to detect programming idioms
US8171476B2 (en) * 2008-02-01 2012-05-01 International Business Machines Corporation Wake-and-go mechanism with prioritization of threads
US8640141B2 (en) 2008-02-01 2014-01-28 International Business Machines Corporation Wake-and-go mechanism with hardware private array
US8127080B2 (en) 2008-02-01 2012-02-28 International Business Machines Corporation Wake-and-go mechanism with system address bus transaction master
US8225120B2 (en) * 2008-02-01 2012-07-17 International Business Machines Corporation Wake-and-go mechanism with data exclusivity
US8739145B2 (en) * 2008-03-26 2014-05-27 Avaya Inc. Super nested block method to minimize coverage testing overhead
US8752007B2 (en) 2008-03-26 2014-06-10 Avaya Inc. Automatic generation of run-time instrumenter
US8195896B2 (en) * 2008-06-10 2012-06-05 International Business Machines Corporation Resource sharing techniques in a parallel processing computing system utilizing locks by replicating or shadowing execution contexts
US8914781B2 (en) * 2008-10-24 2014-12-16 Microsoft Corporation Scalability analysis for server systems
KR101579589B1 (en) * 2009-02-12 2015-12-22 삼성전자 주식회사 Static branch prediction method for pipeline processor and compile method therefor
US9940138B2 (en) 2009-04-08 2018-04-10 Intel Corporation Utilization of register checkpointing mechanism with pointer swapping to resolve multithreading mis-speculations
US8145723B2 (en) 2009-04-16 2012-03-27 International Business Machines Corporation Complex remote update programming idiom accelerator
US8886919B2 (en) * 2009-04-16 2014-11-11 International Business Machines Corporation Remote update programming idiom accelerator with allocated processor resources
US8230201B2 (en) 2009-04-16 2012-07-24 International Business Machines Corporation Migrating sleeping and waking threads between wake-and-go mechanisms in a multiple processor data processing system
US8397052B2 (en) * 2009-08-19 2013-03-12 International Business Machines Corporation Version pressure feedback mechanisms for speculative versioning caches
US8521961B2 (en) * 2009-08-20 2013-08-27 International Business Machines Corporation Checkpointing in speculative versioning caches
CA2680597C (en) * 2009-10-16 2011-06-07 Ibm Canada Limited - Ibm Canada Limitee Managing speculative assist threads
US8429179B1 (en) * 2009-12-16 2013-04-23 Board Of Regents, The University Of Texas System Method and system for ontology driven data collection and processing
EP2519876A1 (en) 2009-12-28 2012-11-07 Hyperion Core, Inc. Optimisation of loops and data flow sections
US9086889B2 (en) * 2010-04-27 2015-07-21 Oracle International Corporation Reducing pipeline restart penalty
US8990802B1 (en) * 2010-05-24 2015-03-24 Thinking Software, Inc. Pinball virtual machine (PVM) implementing computing process within a structural space using PVM atoms and PVM atomic threads
US8856767B2 (en) * 2011-04-29 2014-10-07 Yahoo! Inc. System and method for analyzing dynamic performance of complex applications
US10061618B2 (en) * 2011-06-16 2018-08-28 Imagination Technologies Limited Scheduling heterogenous computation on multithreaded processors
US8739186B2 (en) * 2011-10-26 2014-05-27 Autodesk, Inc. Application level speculative processing
WO2013113595A1 (en) * 2012-01-31 2013-08-08 International Business Machines Corporation Major branch instructions with transactional memory
US9009734B2 (en) 2012-03-06 2015-04-14 Autodesk, Inc. Application level speculative processing
US10558437B1 (en) * 2013-01-22 2020-02-11 Altera Corporation Method and apparatus for performing profile guided optimization for high-level synthesis
US8954546B2 (en) 2013-01-25 2015-02-10 Concurix Corporation Tracing with a workload distributor
US9135145B2 (en) * 2013-01-28 2015-09-15 Rackspace Us, Inc. Methods and systems of distributed tracing
US9483334B2 (en) 2013-01-28 2016-11-01 Rackspace Us, Inc. Methods and systems of predictive monitoring of objects in a distributed network system
US9397902B2 (en) 2013-01-28 2016-07-19 Rackspace Us, Inc. Methods and systems of tracking and verifying records of system change events in a distributed network system
US9813307B2 (en) 2013-01-28 2017-11-07 Rackspace Us, Inc. Methods and systems of monitoring failures in a distributed network system
US8924941B2 (en) 2013-02-12 2014-12-30 Concurix Corporation Optimization analysis using similar frequencies
US8997063B2 (en) 2013-02-12 2015-03-31 Concurix Corporation Periodicity optimization in an automated tracing system
US20130283281A1 (en) 2013-02-12 2013-10-24 Concurix Corporation Deploying Trace Objectives using Cost Analyses
WO2014142704A1 (en) 2013-03-15 2014-09-18 Intel Corporation Methods and apparatus to compile instructions for a vector of instruction pointers processor architecture
US9665474B2 (en) 2013-03-15 2017-05-30 Microsoft Technology Licensing, Llc Relationships derived from trace data
US9575874B2 (en) 2013-04-20 2017-02-21 Microsoft Technology Licensing, Llc Error list and bug report analysis for configuring an application tracer
US9292415B2 (en) 2013-09-04 2016-03-22 Microsoft Technology Licensing, Llc Module specific tracing in a shared module environment
US9772927B2 (en) 2013-11-13 2017-09-26 Microsoft Technology Licensing, Llc User interface for selecting tracing origins for aggregating classes of trace data
US9921848B2 (en) 2014-03-27 2018-03-20 International Business Machines Corporation Address expansion and contraction in a multithreading computer system
US9804846B2 (en) 2014-03-27 2017-10-31 International Business Machines Corporation Thread context preservation in a multithreading computer system
US9594660B2 (en) 2014-03-27 2017-03-14 International Business Machines Corporation Multithreading computer system and program product for executing a query instruction for idle time accumulation among cores
US9218185B2 (en) 2014-03-27 2015-12-22 International Business Machines Corporation Multithreading capability information retrieval
US10102004B2 (en) 2014-03-27 2018-10-16 International Business Machines Corporation Hardware counters to track utilization in a multithreading computer system
US9354883B2 (en) 2014-03-27 2016-05-31 International Business Machines Corporation Dynamic enablement of multithreading
US9417876B2 (en) 2014-03-27 2016-08-16 International Business Machines Corporation Thread context restoration in a multithreading computer system
US10142353B2 (en) 2015-06-05 2018-11-27 Cisco Technology, Inc. System for monitoring and managing datacenters
US10536357B2 (en) 2015-06-05 2020-01-14 Cisco Technology, Inc. Late data detection in data center
US10222995B2 (en) 2016-04-13 2019-03-05 Samsung Electronics Co., Ltd. System and method for providing a zero contention parallel data stack
US10761854B2 (en) * 2016-04-19 2020-09-01 International Business Machines Corporation Preventing hazard flushes in an instruction sequencing unit of a multi-slice processor
US10664377B2 (en) * 2016-07-15 2020-05-26 Blackberry Limited Automation of software verification
US10896130B2 (en) 2016-10-19 2021-01-19 International Business Machines Corporation Response times in asynchronous I/O-based software using thread pairing and co-execution
US10459825B2 (en) * 2017-08-18 2019-10-29 Red Hat, Inc. Intelligent expansion of system information collection
US10503626B2 (en) * 2018-01-29 2019-12-10 Oracle International Corporation Hybrid instrumentation framework for multicore low power processors
US10657057B2 (en) * 2018-04-04 2020-05-19 Nxp B.V. Secure speculative instruction execution in a data processing system
US10896044B2 (en) * 2018-06-21 2021-01-19 Advanced Micro Devices, Inc. Low latency synchronization for operation cache and instruction cache fetching and decoding instructions
US11157283B2 (en) * 2019-01-09 2021-10-26 Intel Corporation Instruction prefetch based on thread dispatch commands
US11556374B2 (en) 2019-02-15 2023-01-17 International Business Machines Corporation Compiler-optimized context switching with compiler-inserted data table for in-use register identification at a preferred preemption point

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6389446B1 (en) * 1996-07-12 2002-05-14 Nec Corporation Multi-processor system executing a plurality of threads simultaneously and an execution method therefor
US6574725B1 (en) * 1999-11-01 2003-06-03 Advanced Micro Devices, Inc. Method and mechanism for speculatively executing threads of instructions
US20040073906A1 (en) * 2002-10-15 2004-04-15 Sun Microsystems, Inc. Processor with speculative multithreading and hardware to support multithreading software {including global registers and busy bit memory elements}

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5860017A (en) * 1996-06-28 1999-01-12 Intel Corporation Processor and method for speculatively executing instructions from multiple instruction streams indicated by a branch instruction
US6212542B1 (en) * 1996-12-16 2001-04-03 International Business Machines Corporation Method and system for executing a program within a multiscalar processor by processing linked thread descriptors
AU6586898A (en) * 1997-03-21 1998-10-20 University Of Maryland Spawn-join instruction set architecture for providing explicit multithreading
US6263404B1 (en) * 1997-11-21 2001-07-17 International Business Machines Corporation Accessing data from a multiple entry fully associative cache buffer in a multithread data processing system
US6182210B1 (en) * 1997-12-16 2001-01-30 Intel Corporation Processor having multiple program counters and trace buffers outside an execution pipeline
US6301705B1 (en) * 1998-10-01 2001-10-09 Institute For The Development Of Emerging Architectures, L.L.C. System and method for deferring exceptions generated during speculative execution
EP0992916A1 (en) * 1998-10-06 2000-04-12 Texas Instruments Inc. Digital signal processor
US6622155B1 (en) * 1998-11-24 2003-09-16 Sun Microsystems, Inc. Distributed monitor concurrency control
US6317816B1 (en) * 1999-01-29 2001-11-13 International Business Machines Corporation Multiprocessor scaleable system and method for allocating memory from a heap
KR100308211B1 (en) * 1999-03-27 2001-10-29 윤종용 Micro computer system with compressed instruction
WO2000068784A1 (en) * 1999-05-06 2000-11-16 Koninklijke Philips Electronics N.V. Data processing device, method for executing load or store instructions and method for compiling programs
US6341347B1 (en) * 1999-05-11 2002-01-22 Sun Microsystems, Inc. Thread switch logic in a multiple-thread processor
US6542991B1 (en) * 1999-05-11 2003-04-01 Sun Microsystems, Inc. Multiple-thread processor with single-thread interface shared among threads
US6351808B1 (en) * 1999-05-11 2002-02-26 Sun Microsystems, Inc. Vertically and horizontally threaded processor with multidimensional storage for storing thread data
US6463526B1 (en) * 1999-06-07 2002-10-08 Sun Microsystems, Inc. Supporting multi-dimensional space-time computing through object versioning
US6532521B1 (en) * 1999-06-30 2003-03-11 International Business Machines Corporation Mechanism for high performance transfer of speculative request data between levels of cache hierarchy
US6484254B1 (en) * 1999-12-30 2002-11-19 Intel Corporation Method, apparatus, and system for maintaining processor ordering by checking load addresses of unretired load instructions against snooping store addresses
US6711671B1 (en) * 2000-02-18 2004-03-23 Hewlett-Packard Development Company, L.P. Non-speculative instruction fetch in speculative processing
US7343602B2 (en) * 2000-04-19 2008-03-11 Hewlett-Packard Development Company, L.P. Software controlled pre-execution in a multithreaded processor
US6684375B2 (en) * 2000-11-22 2004-01-27 Matsushita Electric Industrial Co., Ltd. Delay distribution calculation method, circuit evaluation method and false path extraction method
JP3969009B2 (en) * 2001-03-29 2007-08-29 株式会社日立製作所 Hardware prefetch system
US6928645B2 (en) * 2001-03-30 2005-08-09 Intel Corporation Software-based speculative pre-computation and multithreading
US20020199179A1 (en) * 2001-06-21 2002-12-26 Lavery Daniel M. Method and apparatus for compiler-generated triggering of auxiliary codes
JP3661614B2 (en) * 2001-07-12 2005-06-15 日本電気株式会社 Cache memory control method and multiprocessor system
JP3702814B2 (en) * 2001-07-12 2005-10-05 日本電気株式会社 Multi-thread execution method and parallel processor system
JP3632635B2 (en) * 2001-07-18 2005-03-23 日本電気株式会社 Multi-thread execution method and parallel processor system
SE0102564D0 (en) * 2001-07-19 2001-07-19 Ericsson Telefon Ab L M Arrangement and method in computor system
US6959435B2 (en) * 2001-09-28 2005-10-25 Intel Corporation Compiler-directed speculative approach to resolve performance-degrading long latency events in an application
US7137111B2 (en) * 2001-11-28 2006-11-14 Sun Microsystems, Inc. Aggressive prefetch of address chains
US20030145314A1 (en) * 2002-01-31 2003-07-31 Khoa Nguyen Method of efficient dynamic data cache prefetch insertion
US6959372B1 (en) * 2002-02-19 2005-10-25 Cogent Chipware Inc. Processor cluster architecture and associated parallel processing methods
US6883086B2 (en) * 2002-03-06 2005-04-19 Intel Corporation Repair of mis-predicted load values
US8095920B2 (en) * 2002-09-17 2012-01-10 Intel Corporation Post-pass binary adaptation for software-based speculative precomputation
US7062606B2 (en) * 2002-11-01 2006-06-13 Infineon Technologies Ag Multi-threaded embedded processor using deterministic instruction memory to guarantee execution of pre-selected threads during blocking events
US20040123081A1 (en) * 2002-12-20 2004-06-24 Allan Knies Mechanism to increase performance of control speculation
AU2003303438A1 (en) * 2002-12-24 2004-07-22 Sun Microsystems, Inc. Performing hardware scout threading in a system that supports simultaneous multithreading

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6389446B1 (en) * 1996-07-12 2002-05-14 Nec Corporation Multi-processor system executing a plurality of threads simultaneously and an execution method therefor
US6574725B1 (en) * 1999-11-01 2003-06-03 Advanced Micro Devices, Inc. Method and mechanism for speculatively executing threads of instructions
US20040073906A1 (en) * 2002-10-15 2004-04-15 Sun Microsystems, Inc. Processor with speculative multithreading and hardware to support multithreading software {including global registers and busy bit memory elements}

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7379858B2 (en) * 2004-02-17 2008-05-27 Intel Corporation Computation of all-pairs reaching probabilities in software systems
US20050182602A1 (en) * 2004-02-17 2005-08-18 Intel Corporation Computation of all-pairs reaching probabilities in software systems
US20060047495A1 (en) * 2004-09-01 2006-03-02 Jesus Sanchez Analyzer for spawning pairs in speculative multithreaded processor
US20060212689A1 (en) * 2005-03-18 2006-09-21 Shailender Chaudhry Method and apparatus for simultaneous speculative threading
US7634641B2 (en) * 2005-03-18 2009-12-15 Sun Microsystems, Inc. Method and apparatus for using multiple threads to spectulatively execute instructions
WO2006122990A2 (en) * 2005-05-19 2006-11-23 Intel Corporation Storage-deployment apparatus, system and method for multiple sets of speculative-type instructions
US20080134196A1 (en) * 2005-05-19 2008-06-05 Intel Corporation Apparatus, System, and Method of a Memory Arrangement for Speculative Multithreading
WO2006122990A3 (en) * 2005-05-19 2008-07-03 Intel Corp Storage-deployment apparatus, system and method for multiple sets of speculative-type instructions
US20070011684A1 (en) * 2005-06-27 2007-01-11 Du Zhao H Mechanism to optimize speculative parallel threading
US7627864B2 (en) * 2005-06-27 2009-12-01 Intel Corporation Mechanism to optimize speculative parallel threading
US8185700B2 (en) 2006-05-30 2012-05-22 Intel Corporation Enabling speculative state information in a cache coherency protocol
US20090083488A1 (en) * 2006-05-30 2009-03-26 Carlos Madriles Gimeno Enabling Speculative State Information in a Cache Coherency Protocol
US20080244223A1 (en) * 2007-03-31 2008-10-02 Carlos Garcia Quinones Branch pruning in architectures with speculation support
US8813057B2 (en) 2007-03-31 2014-08-19 Intel Corporation Branch pruning in architectures with speculation support
US20110119660A1 (en) * 2008-07-31 2011-05-19 Panasonic Corporation Program conversion apparatus and program conversion method
CN101826014A (en) * 2010-04-20 2010-09-08 北京邮电大学 Dividing method of source code in software engineering
US8904118B2 (en) 2011-01-07 2014-12-02 International Business Machines Corporation Mechanisms for efficient intra-die/intra-chip collective messaging
US8990514B2 (en) 2011-01-07 2015-03-24 International Business Machines Corporation Mechanisms for efficient intra-die/intra-chip collective messaging
US9971635B2 (en) 2011-01-10 2018-05-15 International Business Machines Corporation Method and apparatus for a hierarchical synchronization barrier in a multi-node system
US9286067B2 (en) 2011-01-10 2016-03-15 International Business Machines Corporation Method and apparatus for a hierarchical synchronization barrier in a multi-node system
US20120204065A1 (en) * 2011-02-03 2012-08-09 International Business Machines Corporation Method for guaranteeing program correctness using fine-grained hardware speculative execution
US9195550B2 (en) * 2011-02-03 2015-11-24 International Business Machines Corporation Method for guaranteeing program correctness using fine-grained hardware speculative execution
US10565005B2 (en) 2013-04-23 2020-02-18 Ab Initio Technology Llc Controlling tasks performed by a computing system
US10489191B2 (en) 2013-04-23 2019-11-26 Ab Initio Technology Llc Controlling tasks performed by a computing system using controlled process spawning
US20140317629A1 (en) * 2013-04-23 2014-10-23 Ab Initio Technology Llc Controlling tasks performed by a computing system
US9665396B2 (en) * 2013-04-23 2017-05-30 Ab Initio Technology Llc Controlling tasks performed by a computing system using instructions generated to control initiation of subroutine execution
US9286090B2 (en) * 2014-01-20 2016-03-15 Sony Corporation Method and system for compiler identification of code for parallel execution
US9348595B1 (en) 2014-12-22 2016-05-24 Centipede Semi Ltd. Run-time code parallelization with continuous monitoring of repetitive instruction sequences
US9135015B1 (en) 2014-12-25 2015-09-15 Centipede Semi Ltd. Run-time code parallelization with monitoring of repetitive instruction sequences during branch mis-prediction
US9208066B1 (en) 2015-03-04 2015-12-08 Centipede Semi Ltd. Run-time code parallelization with approximate monitoring of instruction sequences
US10296346B2 (en) 2015-03-31 2019-05-21 Centipede Semi Ltd. Parallelized execution of instruction sequences based on pre-monitoring
US10296350B2 (en) 2015-03-31 2019-05-21 Centipede Semi Ltd. Parallelized execution of instruction sequences
US9715390B2 (en) 2015-04-19 2017-07-25 Centipede Semi Ltd. Run-time parallelization of code execution based on an approximate register-access specification
US11755484B2 (en) * 2015-06-26 2023-09-12 Microsoft Technology Licensing, Llc Instruction block allocation
US10606727B2 (en) 2016-09-06 2020-03-31 Soroco Private Limited Techniques for generating a graphical user interface to display documentation for computer programs
US10379863B2 (en) * 2017-09-21 2019-08-13 Qualcomm Incorporated Slice construction for pre-executing data dependent loads

Also Published As

Publication number Publication date
US8719806B2 (en) 2014-05-06
US20040154019A1 (en) 2004-08-05
US20100332811A1 (en) 2010-12-30
US7814469B2 (en) 2010-10-12
CN1302384C (en) 2007-02-28
US20040154011A1 (en) 2004-08-05
CN1519718A (en) 2004-08-11
US7523465B2 (en) 2009-04-21

Similar Documents

Publication Publication Date Title
US20040154010A1 (en) Control-quasi-independent-points guided speculative multithreading
JP4042604B2 (en) Program parallelization apparatus, program parallelization method, and program parallelization program
Du et al. A cost-driven compilation framework for speculative parallelization of sequential programs
US7458065B2 (en) Selection of spawning pairs for a speculative multithreaded processor
US6487715B1 (en) Dynamic code motion optimization and path tracing
US6754893B2 (en) Method for collapsing the prolog and epilog of software pipelined loops
US8522220B2 (en) Post-pass binary adaptation for software-based speculative precomputation
US5887174A (en) System, method, and program product for instruction scheduling in the presence of hardware lookahead accomplished by the rescheduling of idle slots
US20110119660A1 (en) Program conversion apparatus and program conversion method
US20050144602A1 (en) Methods and apparatus to compile programs to use speculative parallel threads
US6892380B2 (en) Method for software pipelining of irregular conditional control loops
KR102379894B1 (en) Apparatus and method for managing address conflicts when performing vector operations
Packirisamy et al. Exploring speculative parallelism in SPEC2006
US7712091B2 (en) Method for predicate promotion in a software loop
KR20230058662A (en) Intra-Core Parallelism in Data Processing Apparatus and Methods
Kazi et al. Coarse-grained speculative execution in shared-memory multiprocessors
US20060047495A1 (en) Analyzer for spawning pairs in speculative multithreaded processor
Kazi et al. Coarse-grained thread pipelining: A speculative parallel execution model for shared-memory multiprocessors
JP2001243070A (en) Processor and branch predicting method and compile method
US6637026B1 (en) Instruction reducing predicate copy
KR20150040663A (en) Method and Apparatus for instruction scheduling using software pipelining
US20070074186A1 (en) Method and system for performing reassociation in software loops
Wang et al. Exploiting speculative thread-level parallelism in data compression applications
Samuelsson “A Comparison of List Scheduling Heuristics in LLVM Targeting POWER8
Lu et al. Branch penalty reduction on IBM cell SPUs via software branch hinting

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARCUELLO, PEDRO;GONZALEZ, ANTONIO;WANG, HONG;AND OTHERS;REEL/FRAME:014151/0112;SIGNING DATES FROM 20030508 TO 20030603

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: TAHOE RESEARCH, LTD., IRELAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTEL CORPORATION;REEL/FRAME:061827/0686

Effective date: 20220718