US20090178051A1 - Method for implementing dynamic lifetime reliability extension for microprocessor architectures - Google Patents

Method for implementing dynamic lifetime reliability extension for microprocessor architectures Download PDF

Info

Publication number
US20090178051A1
US20090178051A1 US12/118,050 US11805008A US2009178051A1 US 20090178051 A1 US20090178051 A1 US 20090178051A1 US 11805008 A US11805008 A US 11805008A US 2009178051 A1 US2009178051 A1 US 2009178051A1
Authority
US
United States
Prior art keywords
resource
operational mode
primary
resources
lifetime
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/118,050
Inventor
Victor Zyuban
Jeonghee Shin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US12/118,050 priority Critical patent/US20090178051A1/en
Publication of US20090178051A1 publication Critical patent/US20090178051A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2025Failover techniques using centralised failover control functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2051Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant in regular structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/004Error avoidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/503Resource availability

Definitions

  • the present invention relates generally to improvements in lifetime reliability of semiconductor devices and, more particularly, to a system and method for implementing dynamic lifetime reliability extension for microprocessor architectures.
  • Lifetime reliability has become one of the major concerns in microprocessor architectures implemented with deep submicron technologies. In particular, extreme scaling resulting in atomic-range dimensions, inter and intra-device variability, and escalating power densities have all contributed to this concern.
  • many reliability models have been proposed and empirically validated by academia and industry. As such, the basic mechanisms of failures at a low level have been fairly well understood, and thus the models at that level have gained widespread acceptance.
  • work lifetime reliability models for use with single-core architecture-level, cycle-accurate simulators have been introduced. Such models have focused on certain major failure mechanisms including, for example, electromigration (EM), negative bias temperature instability (NBTI), positive bias temperature instability (PBTI), and time dependent dielectric breakdown (TDDB).
  • EM electromigration
  • NBTI negative bias temperature instability
  • PBTI positive bias temperature instability
  • TDDB time dependent dielectric breakdown
  • sparing techniques With respect to improving lifetime reliability of semiconductor devices, existing efforts may be grouped into three general categories: sparing techniques, graceful degradation techniques, and voltage/frequency scaling techniques.
  • sparing techniques spare resources are designed for one or more primary resources and deactivated at system deployment. When primary resources fail later during system lifetime, the spare resources are then activated and replace the failed resources in order to extend system lifetime.
  • the sparing techniques cause less performance degradation due to failed resources.
  • high area overhead of spare resources is a primary drawback of this approach.
  • graceful degradation techniques spare resources are not essential in order to extend system lifetime. Instead, when resource failing occurs, systems are reconfigured in such a way so as to isolate the failed resources from the systems and continue to be functional. As a result, graceful degradation techniques save overhead cost for spare resources, however system performance degrades throughout lifetime. Accordingly, graceful degradation techniques are limited to applications and business where the degradation of performance over time is acceptable, which unfortunately excludes most of the high-end computing.
  • voltage/frequency scaling techniques are often used for power and temperature reduction and are thus proposed for lifetime extension.
  • the system lifetime is predicted based on applied workloads and the voltage/frequency of the systems is scaled with respect to lifetime prediction.
  • voltage/frequency scaling techniques enable aging of systems to be slowed down as needed, these techniques also result in performance degradation of the significant parts of the system or the entire systems.
  • reduced voltage/frequency diminishes the degree of stress conditions, these techniques are unable to actually remove stress conditions of aging mechanisms from semiconductor devices.
  • Still another existing technique directed to reducing the leakage power during inactive intervals, is to use so-called “sleep” or “power down” modes for logic devices that are complemented with transistors that serve as a footer or a header to cut leakage during the quiescence intervals.
  • the circuits achieve high performance, resulting from the use of faster transistors which typically have higher leakage.
  • the headers and/or footers are activated so as to couple the circuits to V dd and/or ground (more generally logic high and low voltage supply rails).
  • the high threshold footer or header transistors are deactivated to cut off leakage paths, thereby reducing the leakage currents by orders of magnitude.
  • a method for implementing dynamic lifetime reliability extension for microprocessor architectures including a configuring plurality of primary resources; configuring a secondary resource pool having one or more secondary resources; configuring a resource operational mode controller to selectively switch each of the primary and secondary resources between an operational mode and a non-operational mode, wherein the operational mode corresponds to performance of one or more tasks for which a given resource is designed to execute with respect to a microprocessor system and wherein the non-operational mode corresponds to a temporary lifetime extension process for at least one of suspending the aging of resources and reversing the aging of resources; configuring a resource mapper associated with the secondary resource pool and in communication with the resource operational mode controller to map a secondary resource placed into the operational mode to a corresponding primary resource placed into the non-operational mode, and to un-map a secondary resource from a corresponding primary resource in the event the corresponding primary resource is placed back into the operational
  • FIG. 1 is a schematic block diagram of a system for implementing dynamic lifetime reliability extension for microprocessor architectures, in accordance with an embodiment of the invention
  • FIG. 2 is a flow diagram illustrating an exemplary method of implementing dynamic lifetime reliability extension for microprocessor architectures, as executed by the system of FIG. 1 ;
  • FIG. 3 is a schematic block diagram illustrating a specific example of an application of the system and method of FIGS. 1 and 2 as applied to L2 caches in multiple on-chip processor core systems.
  • Non-operational modes may include, for example, special lifetime extension methods for suspending and/or reversing the aging of resources. That is, rather than using the resources for their intended purpose, components (e.g., transistors) of such resources are temporarily subjected to a process in which stress conditions of aging mechanisms, such as electromigration, negative bias temperature instability (NBTI), positive bias temperature instability (PBTI), and time dependent dielectric breakdown (TDDB), are removed and/or reversed with respect to the semiconductor devices comprising the resources.
  • stress conditions of aging mechanisms such as electromigration, negative bias temperature instability (NBTI), positive bias temperature instability (PBTI), and time dependent dielectric breakdown (TDDB)
  • an “operational mode” as used herein generally refers to a task or tasks for which a microprocessor component is designed to execute with respect to a microprocessor system. Additional information regarding aging mechanism removal (termed “wearout gating”) and aging mechanism reversal (termed “intense recovery”) may be found in co-pending application Ser. Nos. 11/928,232 and 11/928,205, respectively, both filed on Oct. 30, 2007, assigned to the assignee of the present application, and the contents of which are incorporated herein by reference.
  • the overall system performance during a lifetime extension non-operational mode (such as described above) for one or more components can also be improved by having spare resources take over the responsibility of the resources placed in the non-operational modes.
  • “original” resources are also referred to as “primary” resources and those resources replacing primary resources in non-operational modes are also referred to as “secondary” resources.
  • the secondary resources are dismissed from replacement duty when the primary resources mapped thereto are placed back in a normal operational mode. Further, the secondary resources may themselves enter a non-operational mode (e.g., wearout gating, intense recovery) until such time as they are needed once again to replace the same or different primary resources.
  • the secondary resources may be statically allocated to corresponding primary resources by design or, alternatively, be shared by more than one primary resource. In the latter case, shared secondary resources logically create a pool, also referred to herein as a “secondary resource pool,” but are not necessarily physically located together. Also, the secondary resources in the pool are optionally reconfigurable or micro-code programmable for one or more functions.
  • the disclosed embodiments herein also enable transactions along “critical paths” (in terms of performance) to be executed on primary resources by interrupting a non-operational mode of the primary resource if needed.
  • Such an interruption of a non-operational, lifetime extension mode may occur whenever critical transactions are detected or predicted, or when overall system performance degradation exceeds a predefined threshold performance.
  • critical transactions the additional execution latency of instructions behind load instructions stalled due to a cache miss has little affect the overall performance.
  • load instructions incurring a cache miss leading to a mispredicted branch, or slowing down the issue rate while waiting for data to arrive) are along the critical path of performance.
  • the criticality of transactions may be predicted in accordance with known techniques, such as based on their history or may be detected by known techniques, such as by using a data dependence graph or a heuristic approach.
  • the system 100 includes among other aspects, a plurality of primary resources 102 a , 102 b , 102 c , etc., (depicted with transaction queues 104 associated therewith), lifetime predictor 106 , a secondary resource pool having one or more secondary resources 108 a , 108 b , etc., (also depicted with transaction queues 104 associated therewith), a resource mapper 110 associated with the secondary resource pool for tracking the mapping of secondary resources 108 to primary resources 102 in non-operational modes, a resource operational mode controller 112 , and a transaction decoder 114 .
  • the lifetime predictor 106 monitors access patterns (such as events and states, for example) of primary resources 102 and predicts their remaining lifetime. It is contemplated that any suitable methodologies for lifetime prediction may be used in the system 100 such as, for example, those disclosed in U.S. Patent Application Publication Nos. 20050257078 and 20060080062, and in U.S. application Ser. No. 11/735,533 filed Apr. 16, 2007, each assigned to the assignee of the present application and the contents of which are incorporated by reference herein in their entirety.
  • the lifetime predictor 106 alerts the resource operational mode controller 112 and resource mapper 110 so that the identified primary resource(s) may be placed in a non-operational mode (e.g., wearout gating, intense recovery, etc.).
  • a non-operational mode e.g., wearout gating, intense recovery, etc.
  • the resource mapper 110 is responsible for finding and allocating available secondary resources in the pool upon request, and to map the allocated secondary resource (e.g., secondary resource 108 b ) to the identified primary resource for lifetime extension treatment (e.g., primary resource 102 c ) such that decoded transactions from the decoder 114 originally intended for the identified primary resource are thereafter executed by the allocated secondary resource. If there is more than one available secondary resource found in the pool, the mapper 110 may select a secondary resource based on one or more of: round-robin order, remaining lifetime order of the secondary resources themselves, and maximum recovery based order. On the other hand, if there is no available secondary resource found, the resource mapper 110 may either reject the request or commandeer a secondary resource already mapped to another primary resource.
  • the allocated secondary resource e.g., secondary resource 108 b
  • lifetime extension treatment e.g., primary resource 102 c
  • the resource mapper 110 sends a rejection message to the requester (e.g., through the resource operational mode controller 112 ), optionally with a ticket number for retry.
  • the resource mapper 110 may optionally include a reservation table to keep record of rejected requests and, if any secondary resource subsequently becomes free, the resource mapper 110 allocates it to one of pending requests in the reservation table.
  • the resource mapper keeps information of the priority of primary resources mapped to secondary resources in the pool in a certain order, such as by remaining lifetime or by maximum recovery, for example.
  • the resource mapper 110 is further configured to evaluate the priority of a new request. For example, if lower priority primary resources (with respect to the requesting primary resource) are found, then the resource mapper 110 de-allocates the secondary resource mapped to the primary resource having the lowest priority and also notifies the resource operational mode controller 112 to interrupt the non-operational mode for that lowest priority primary resource. Once the non-operational mode for the lowest priority primary resource has been interrupted, the resource mapper 110 then maps the commandeered secondary resource to the higher priority requesting primary resource and notifies the resource operational mode controller 112 of the new request so as to commence placement of the higher priority requesting primary resource into the non-operational mode for lifetime extension treatment.
  • the resource mapper 110 sends a rejection to the requester and optionally adds the request to a reservation table as described above.
  • the resource operational mode controller 112 controls whether the primary/secondary resources are in an operational or a non-operational mode.
  • the controller 112 is also responsible for ensuring that primary resource states are safely migrated or stored before switching from operational to non-operational if needed, and that secondary resource states are safely migrated or stored before switching from non-operational to operational if needed. Further, the controller 112 directs the decoder 114 to communicate transactions to either primary resources in an operational mode or to secondary resources that are mapped to primary resources in a non-operational mode.
  • the resource operational mode controller 112 places a primary resource in a non-operational mode upon receiving notification from the resource mapper 110 that a secondary resource has been mapped to the primary resource, resulting from one or more of: a request from the lifetime predictor 106 upon a determination that the predicted remaining lifetime of the primary resource is shorter than a defined threshold; a notification from the decoder when the primary resource is predicted as idle, not anticipating critical transactions, or not causing performance degradation that exceeds a threshold for sufficient amount of time; and a determination by the resource mapper 110 of regularly scheduled lifetime extension for primary resources.
  • the resource operational mode controller 112 interrupts or terminates a non-operational mode in one or more cases of: when the scheduled time for lifetime extension techniques in the non-operational mode is up; when transactions along critical paths are detected or performance degradation exceeds a threshold; when the resource mapper 110 requests the termination of the non-operational mode in order to allocate the secondary resource to another primary resource or to have the secondary resource enter the non-operational mode.
  • FIG. 2 there is shown a flow diagram illustrating an exemplary method 200 of implementing dynamic lifetime reliability extension for microprocessor architectures (while alleviating performance degradation), as may be executed by the system of FIG. 1 .
  • the decoder(s) (e.g., decoder 114 of FIG. 1 ) await requested transactions, and then decodes the transactions in block 204 to determine what primary resources are needed to execute the transactions, as well as whether or not such transactions are along the critical path in terms of system performance.
  • a non-operational mode e.g., wearout gating, intense recovery, etc. for lifetime extension
  • decision block 206 determines that the primary resources needed to execute the requested transactions are presently in a non-operational mode are in fact presently in a non-operational mode. a further inquiry is made in decision block 210 as to whether the requested transactions are critical. If the transactions are not critical, then the transactions are sent to (and executed by) the secondary resources as indicated in block 212 . The method then loops back to block 202 to await additionally requested transactions.
  • the method proceed to block 214 where the non-operational mode of the primary resources is interrupted. Then, in block 216 , the transactions are sent to (and executed by) the primary resources. Once the transactions are completed, the non-operational mode may be resumed for the primary resources as shown in block 218 , after which method loops back to block 202 to await additionally requested transactions.
  • the criticality determination at block 210 may optionally be omitted such that the non-operational mode of primary resources is automatically interrupted for requested transactions, regardless of whether they are critical or not.
  • the dashed lines associated with block 208 and 210 are directed to another (optional) aspect of the exemplary method 200 , in which the lifetime of primary resources is dynamically predicted (e.g., by lifetime predictor 106 of FIG. 1 ).
  • the lifetime of primary resources is dynamically predicted (e.g., by lifetime predictor 106 of FIG. 1 ).
  • access patterns are monitored and used to predict the remaining lifetime of the primary resources, as shown in block 220 .
  • decision block 222 it is determined whether the predicted remaining lifetime of any of the primary resources has fallen below a defined threshold (T critical ). If not, no action need be taken and this portion of the process returns to block 200 to continue to monitor access patterns and predict lifetime of primary resources.
  • T critical a defined threshold
  • the lifetime predictor 106 sends a request for secondary resources to the resource mapper 110 ( FIG. 1 ) as shown in block 224 .
  • the resource mapper sends a rejection to the requesters, along with a recommended retry time. The process then continues to monitor access patterns and predict resource lifetimes in block 220 .
  • the mapper has a reservation table to keep record of rejected requests and, if any secondary resource becomes available, the mapper allocates it to one of the pending requests in the table.
  • the resource operational mode controller is optionally able to request secondary resources to further extend lifetime of primary resources.
  • the resource mapper allocates the available resources to the requested primary resources in block 228 , updates a resource mapping table accordingly and notifies the resource operational mode controller to place the aging resource in a non-operational mode.
  • the resource operational mode controller configures the corresponding primary resource from the operational mode to the non-operational mode. In switching from the operational mode to the non-operational mode, the controller resource operational mode further ensures that the states of the primary resources are safely stored or migrated to the secondary resources mapped in the pool if needed, thus avoiding adverse effects on system integrity.
  • SRAM static random access memory
  • EDRAM embedded dynamic random access memory
  • FIG. 3 a schematic block diagram illustrating a specific example of an application of the system and method of FIGS. 1 and 2 , as applied to L2 caches in multiple on-chip processor core systems.
  • the system 300 includes are eight on-chip L2 caches 302 (cache 0 through cache 7 ) connected through a common bus 304 , wherein each cache 302 may be private with respect to one on-chip processor core or, alternatively, shared by more than one on-chip processor core.
  • each L2 cache 302 further includes two banks (Bank 0 , Bank 1 ) and has 8-way set associativity.
  • the dynamic lifetime reliability extension techniques described above is capable of being applied to various granularities of L2 caches for the non-operational mode, such as to the entire cache, banks, arrays, columns and rows.
  • eight associative ways may each enter a non-operational mode independently with respect one another.
  • Each associative way is implemented as one or more memory arrays 306 .
  • each L2 cache 302 has a resource operational mode controller 112 and a lifetime predictor 106 associated therewith.
  • the secondary resource pool 308 includes a plurality of secondary arrays 310 and a resource mapper(s) 110 . As described above, these secondary arrays 310 in the secondary resource pool 308 may be physically located together and/or distributed across the chip.
  • the resource mapper(s) 110 is also centralized or distributed or both, along with secondary arrays.
  • the primary array(s) for way 1 of bank 0 in L2 cache 0 (also marked with an “X” in FIG. 3 ) is requested to be placed in the non-operational mode.
  • the secondary resource mapper 110 checks to see whether there are any available secondary arrays 310 in the pool 308 . If any are available, the mapper 110 allocates one of the available secondary arrays 310 to the requested primary array (in this example, to way 1 of bank 0 in L2 cache 0 ) and updates the resource mapping table in such a way so as to respond bus transactions associated with L2 cache 0 and the address range of the primary array.
  • the selected secondary array enters the operational mode if it has been in the non-operational mode.
  • the mapper 110 rejects the request with a retry time or adds the rejected request to the reservation table.
  • available refers to those secondary arrays that are either free and have a remaining lifetime longer than the threshold lifetime, or those secondary arrays that are already mapped to other primary arrays having a longer remaining lifetime than the array for way 1 .
  • the controller 112 If the resource operational mode controller 112 receives a mapping confirmation from the secondary resource mapper 308 , the controller 112 commences a drain process of cache lines in the array of way 1 . However, if the secondary array request is rejected, the resource operational mode controller 112 retries after the recommended retry time. In the drain process, cache lines in the dirty or exclusive state are written back to main memory and L1 cache lines associated to the cache lines are invalidated. Alternatively, in the drain process, valid cache lines are migrated to the mapped secondary array, optionally through dedicated interconnects.
  • the resource operational mode controller 112 configures the array of way 1 to enter the non-operational mode.
  • the resource operational mode controller 112 configures the L2 cache controller (not specifically shown) in such a way so as to send accesses of cache lines in the array of way 1 to the secondary resource pool 310 .
  • the cache controller sends the cache line to the processor cores.
  • the controller issues a bus transaction which is responded to by the secondary array.
  • the controller issues a cache miss request on the bus 304 , which is in turn responded to by other L2 caches, L3 caches or memory.
  • the cache controller When processor cores request writes hitting in the L2 cache, the cache controller updates the cache line. If writes hit in the secondary array, the cache controller sends the data to the secondary array for writes.
  • writes miss in the L2 cache including the secondary subarray the controller issues a cache miss for writes on bus.
  • the cache controller snoops the miss. If the controller has to provide data to the requester, it reads the cache and sends data on bus, or lets the secondary arrays provide data to the requester.
  • the cache controller snoops the miss and invalidates the cache line if necessary. Since invalidation requires only a change of cache line state, the secondary array does nothing even if it has the cache line for invalidation.
  • the non-operational mode of the array of way 1 is terminated by the resource operational mode controller 112 in the event the cache miss rate or cache access time exceeds the threshold, or by the secondary resource mapper 110 in the event the non-operational time is up or the secondary array needs to be either allocated to another primary array or itself enter the non-operational mode.
  • the resource operational mode controller 112 configures the primary array of way 1 to enter the operational mode and begins a drain process of cache lines in the secondary array mapped to way 1 .
  • cache lines in the dirty or exclusive state are written back to main memory (or to the next level of the memory cache hierarchy) and the corresponding L1 cache lines are invalidated.
  • valid cache lines are migrated to the primary array of way 1 , optionally through dedicated interconnects.
  • the resource operational mode controller 112 notifies the secondary resource mapper 110 to un-map the secondary array.
  • the unmapped secondary array either enters the non-operational mode or is allocated to another array that caused the termination or another array having the highest priority in the reservation table.

Abstract

A method for implementing dynamic lifetime reliability extension for microprocessor architectures having a plurality of primary resources and a secondary resource pool of one or more secondary resources includes configuring a resource operational mode controller to selectively switch of the primary and secondary resources between an operational mode and a non-operational mode, wherein the non-operational mode corresponds to a lifetime extension process; configuring a resource mapper associated with the secondary resource pool and in communication with the resource operational mode controller to map a secondary resource placed into the operational mode to a corresponding primary resource placed into the non-operational mode; and configuring a transaction decoder to receive incoming transaction requests and direct the requests to one of a primary resource in the operational mode and a secondary resource in the operational mode, the secondary resource mapped to an associated primary resource placed in the non-operational mode.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This non-provisional U.S. patent application is a continuation of pending U.S. patent application Ser. No. 11/969,413, which was filed Jan. 4, 2008, and is assigned to the present assignee.
  • BACKGROUND
  • The present invention relates generally to improvements in lifetime reliability of semiconductor devices and, more particularly, to a system and method for implementing dynamic lifetime reliability extension for microprocessor architectures.
  • Lifetime reliability has become one of the major concerns in microprocessor architectures implemented with deep submicron technologies. In particular, extreme scaling resulting in atomic-range dimensions, inter and intra-device variability, and escalating power densities have all contributed to this concern. At the device and circuit levels, many reliability models have been proposed and empirically validated by academia and industry. As such, the basic mechanisms of failures at a low level have been fairly well understood, and thus the models at that level have gained widespread acceptance. In particular, work lifetime reliability models for use with single-core architecture-level, cycle-accurate simulators have been introduced. Such models have focused on certain major failure mechanisms including, for example, electromigration (EM), negative bias temperature instability (NBTI), positive bias temperature instability (PBTI), and time dependent dielectric breakdown (TDDB).
  • With respect to improving lifetime reliability of semiconductor devices, existing efforts may be grouped into three general categories: sparing techniques, graceful degradation techniques, and voltage/frequency scaling techniques. In sparing techniques, spare resources are designed for one or more primary resources and deactivated at system deployment. When primary resources fail later during system lifetime, the spare resources are then activated and replace the failed resources in order to extend system lifetime. The sparing techniques cause less performance degradation due to failed resources. However, high area overhead of spare resources is a primary drawback of this approach.
  • In graceful degradation techniques, spare resources are not essential in order to extend system lifetime. Instead, when resource failing occurs, systems are reconfigured in such a way so as to isolate the failed resources from the systems and continue to be functional. As a result, graceful degradation techniques save overhead cost for spare resources, however system performance degrades throughout lifetime. Accordingly, graceful degradation techniques are limited to applications and business where the degradation of performance over time is acceptable, which unfortunately excludes most of the high-end computing.
  • Thirdly, voltage/frequency scaling techniques are often used for power and temperature reduction and are thus proposed for lifetime extension. The system lifetime is predicted based on applied workloads and the voltage/frequency of the systems is scaled with respect to lifetime prediction. While voltage/frequency scaling techniques enable aging of systems to be slowed down as needed, these techniques also result in performance degradation of the significant parts of the system or the entire systems. In addition, although reduced voltage/frequency diminishes the degree of stress conditions, these techniques are unable to actually remove stress conditions of aging mechanisms from semiconductor devices.
  • Still another existing technique, directed to reducing the leakage power during inactive intervals, is to use so-called “sleep” or “power down” modes for logic devices that are complemented with transistors that serve as a footer or a header to cut leakage during the quiescence intervals. During a normal operation mode, the circuits achieve high performance, resulting from the use of faster transistors which typically have higher leakage. The headers and/or footers are activated so as to couple the circuits to Vdd and/or ground (more generally logic high and low voltage supply rails). In contrast, during the sleep mode, the high threshold footer or header transistors are deactivated to cut off leakage paths, thereby reducing the leakage currents by orders of magnitude. This technique, also known as “power gating,” has been successfully used in embedded devices, such as systems on a chip (SOC). However, although power gating diminishes current flow and electric field across semiconductor devices (which results in a certain degree of stress reduction and increase in the lifetime of devices), it is unable to completely eliminate such stress conditions and/or stimulate the recovery effects of aging mechanisms.
  • SUMMARY
  • The foregoing discussed drawbacks and deficiencies of the prior art are overcome or alleviated, in an exemplary embodiment, by a method for implementing dynamic lifetime reliability extension for microprocessor architectures, including a configuring plurality of primary resources; configuring a secondary resource pool having one or more secondary resources; configuring a resource operational mode controller to selectively switch each of the primary and secondary resources between an operational mode and a non-operational mode, wherein the operational mode corresponds to performance of one or more tasks for which a given resource is designed to execute with respect to a microprocessor system and wherein the non-operational mode corresponds to a temporary lifetime extension process for at least one of suspending the aging of resources and reversing the aging of resources; configuring a resource mapper associated with the secondary resource pool and in communication with the resource operational mode controller to map a secondary resource placed into the operational mode to a corresponding primary resource placed into the non-operational mode, and to un-map a secondary resource from a corresponding primary resource in the event the corresponding primary resource is placed back into the operational mode; and configuring a transaction decoder to receive incoming transaction requests and, responsive to the resource operational mode controller, direct the requests to one of a primary resource in the operational mode and a secondary resource in the operational mode, the secondary resource mapped to an associated primary resource placed in the non-operational mode.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Referring to the exemplary drawings wherein like elements are numbered alike in the several Figures:
  • FIG. 1 is a schematic block diagram of a system for implementing dynamic lifetime reliability extension for microprocessor architectures, in accordance with an embodiment of the invention;
  • FIG. 2 is a flow diagram illustrating an exemplary method of implementing dynamic lifetime reliability extension for microprocessor architectures, as executed by the system of FIG. 1; and
  • FIG. 3 is a schematic block diagram illustrating a specific example of an application of the system and method of FIGS. 1 and 2 as applied to L2 caches in multiple on-chip processor core systems.
  • DETAILED DESCRIPTION
  • Disclosed herein is a method for implementing dynamic lifetime reliability extension for microprocessor architectures, wherein various microprocessor components and circuits are selectively placed in “non-operational” modes. Such non-operational modes may include, for example, special lifetime extension methods for suspending and/or reversing the aging of resources. That is, rather than using the resources for their intended purpose, components (e.g., transistors) of such resources are temporarily subjected to a process in which stress conditions of aging mechanisms, such as electromigration, negative bias temperature instability (NBTI), positive bias temperature instability (PBTI), and time dependent dielectric breakdown (TDDB), are removed and/or reversed with respect to the semiconductor devices comprising the resources. In contrast, an “operational mode” as used herein generally refers to a task or tasks for which a microprocessor component is designed to execute with respect to a microprocessor system. Additional information regarding aging mechanism removal (termed “wearout gating”) and aging mechanism reversal (termed “intense recovery”) may be found in co-pending application Ser. Nos. 11/928,232 and 11/928,205, respectively, both filed on Oct. 30, 2007, assigned to the assignee of the present application, and the contents of which are incorporated herein by reference.
  • In addition to lifetime extension, the overall system performance during a lifetime extension non-operational mode (such as described above) for one or more components can also be improved by having spare resources take over the responsibility of the resources placed in the non-operational modes. As used herein, “original” resources are also referred to as “primary” resources and those resources replacing primary resources in non-operational modes are also referred to as “secondary” resources. The secondary resources are dismissed from replacement duty when the primary resources mapped thereto are placed back in a normal operational mode. Further, the secondary resources may themselves enter a non-operational mode (e.g., wearout gating, intense recovery) until such time as they are needed once again to replace the same or different primary resources. The secondary resources may be statically allocated to corresponding primary resources by design or, alternatively, be shared by more than one primary resource. In the latter case, shared secondary resources logically create a pool, also referred to herein as a “secondary resource pool,” but are not necessarily physically located together. Also, the secondary resources in the pool are optionally reconfigurable or micro-code programmable for one or more functions.
  • For further performance improvement, the disclosed embodiments herein also enable transactions along “critical paths” (in terms of performance) to be executed on primary resources by interrupting a non-operational mode of the primary resource if needed. Such an interruption of a non-operational, lifetime extension mode may occur whenever critical transactions are detected or predicted, or when overall system performance degradation exceeds a predefined threshold performance. With respect to “critical transactions,” the additional execution latency of instructions behind load instructions stalled due to a cache miss has little affect the overall performance. On the other hand, load instructions incurring a cache miss (leading to a mispredicted branch, or slowing down the issue rate while waiting for data to arrive) are along the critical path of performance. In one exemplary embodiment, the criticality of transactions may be predicted in accordance with known techniques, such as based on their history or may be detected by known techniques, such as by using a data dependence graph or a heuristic approach.
  • Referring now to FIG. 1, there is shown a schematic block diagram of a system 100 for implementing dynamic lifetime reliability extension for microprocessor architectures (while also alleviating performance degradation), in accordance with an embodiment of the invention. As shown, the system 100 includes among other aspects, a plurality of primary resources 102 a, 102 b, 102 c, etc., (depicted with transaction queues 104 associated therewith), lifetime predictor 106, a secondary resource pool having one or more secondary resources 108 a, 108 b, etc., (also depicted with transaction queues 104 associated therewith), a resource mapper 110 associated with the secondary resource pool for tracking the mapping of secondary resources 108 to primary resources 102 in non-operational modes, a resource operational mode controller 112, and a transaction decoder 114.
  • The lifetime predictor 106 monitors access patterns (such as events and states, for example) of primary resources 102 and predicts their remaining lifetime. It is contemplated that any suitable methodologies for lifetime prediction may be used in the system 100 such as, for example, those disclosed in U.S. Patent Application Publication Nos. 20050257078 and 20060080062, and in U.S. application Ser. No. 11/735,533 filed Apr. 16, 2007, each assigned to the assignee of the present application and the contents of which are incorporated by reference herein in their entirety. In the event that any of the primary resources 102 are predicted as having a lifetime shorter than a defined threshold lifetime, the lifetime predictor 106 alerts the resource operational mode controller 112 and resource mapper 110 so that the identified primary resource(s) may be placed in a non-operational mode (e.g., wearout gating, intense recovery, etc.).
  • In particular, the resource mapper 110 is responsible for finding and allocating available secondary resources in the pool upon request, and to map the allocated secondary resource (e.g., secondary resource 108 b) to the identified primary resource for lifetime extension treatment (e.g., primary resource 102 c) such that decoded transactions from the decoder 114 originally intended for the identified primary resource are thereafter executed by the allocated secondary resource. If there is more than one available secondary resource found in the pool, the mapper 110 may select a secondary resource based on one or more of: round-robin order, remaining lifetime order of the secondary resources themselves, and maximum recovery based order. On the other hand, if there is no available secondary resource found, the resource mapper 110 may either reject the request or commandeer a secondary resource already mapped to another primary resource.
  • In the former case, the resource mapper 110 sends a rejection message to the requester (e.g., through the resource operational mode controller 112), optionally with a ticket number for retry. In addition, the resource mapper 110 may optionally include a reservation table to keep record of rejected requests and, if any secondary resource subsequently becomes free, the resource mapper 110 allocates it to one of pending requests in the reservation table. In the latter case, the resource mapper keeps information of the priority of primary resources mapped to secondary resources in the pool in a certain order, such as by remaining lifetime or by maximum recovery, for example.
  • In addition, the resource mapper 110 is further configured to evaluate the priority of a new request. For example, if lower priority primary resources (with respect to the requesting primary resource) are found, then the resource mapper 110 de-allocates the secondary resource mapped to the primary resource having the lowest priority and also notifies the resource operational mode controller 112 to interrupt the non-operational mode for that lowest priority primary resource. Once the non-operational mode for the lowest priority primary resource has been interrupted, the resource mapper 110 then maps the commandeered secondary resource to the higher priority requesting primary resource and notifies the resource operational mode controller 112 of the new request so as to commence placement of the higher priority requesting primary resource into the non-operational mode for lifetime extension treatment.
  • Conversely, if lower priority primary resources (with respect to the requesting primary resource) are not found, the resource mapper 110 sends a rejection to the requester and optionally adds the request to a reservation table as described above. The resource operational mode controller 112, as indicated, controls whether the primary/secondary resources are in an operational or a non-operational mode. The controller 112 is also responsible for ensuring that primary resource states are safely migrated or stored before switching from operational to non-operational if needed, and that secondary resource states are safely migrated or stored before switching from non-operational to operational if needed. Further, the controller 112 directs the decoder 114 to communicate transactions to either primary resources in an operational mode or to secondary resources that are mapped to primary resources in a non-operational mode.
  • In an exemplary embodiment, the resource operational mode controller 112 places a primary resource in a non-operational mode upon receiving notification from the resource mapper 110 that a secondary resource has been mapped to the primary resource, resulting from one or more of: a request from the lifetime predictor 106 upon a determination that the predicted remaining lifetime of the primary resource is shorter than a defined threshold; a notification from the decoder when the primary resource is predicted as idle, not anticipating critical transactions, or not causing performance degradation that exceeds a threshold for sufficient amount of time; and a determination by the resource mapper 110 of regularly scheduled lifetime extension for primary resources.
  • Further, the resource operational mode controller 112 interrupts or terminates a non-operational mode in one or more cases of: when the scheduled time for lifetime extension techniques in the non-operational mode is up; when transactions along critical paths are detected or performance degradation exceeds a threshold; when the resource mapper 110 requests the termination of the non-operational mode in order to allocate the secondary resource to another primary resource or to have the secondary resource enter the non-operational mode.
  • Referring now to FIG. 2, there is shown a flow diagram illustrating an exemplary method 200 of implementing dynamic lifetime reliability extension for microprocessor architectures (while alleviating performance degradation), as may be executed by the system of FIG. 1.
  • Beginning in block 202, the decoder(s) (e.g., decoder 114 of FIG. 1) await requested transactions, and then decodes the transactions in block 204 to determine what primary resources are needed to execute the transactions, as well as whether or not such transactions are along the critical path in terms of system performance. At decision block 206, it is then determined whether the primary resources needed to execute the requested transactions are presently in a non-operational mode (e.g., wearout gating, intense recovery, etc. for lifetime extension). If they are not (i.e., they are in an operational mode), the method proceeds to block 208 where the transactions are sent to (and executed by) the primary resources. In this case, the method then simply loops back to block 202 to await additionally requested transactions.
  • However, if at decision block 206 it is determined that the primary resources needed to execute the requested transactions are presently in a non-operational mode are in fact presently in a non-operational mode, a further inquiry is made in decision block 210 as to whether the requested transactions are critical. If the transactions are not critical, then the transactions are sent to (and executed by) the secondary resources as indicated in block 212. The method then loops back to block 202 to await additionally requested transactions.
  • Conversely, if it is determined at decision block 210 that the transactions are critical, then the method proceed to block 214 where the non-operational mode of the primary resources is interrupted. Then, in block 216, the transactions are sent to (and executed by) the primary resources. Once the transactions are completed, the non-operational mode may be resumed for the primary resources as shown in block 218, after which method loops back to block 202 to await additionally requested transactions. Alternatively, the criticality determination at block 210 may optionally be omitted such that the non-operational mode of primary resources is automatically interrupted for requested transactions, regardless of whether they are critical or not.
  • As will be further noted from FIG. 2, as transactions are sent to and executed by primary resources, the dashed lines associated with block 208 and 210 are directed to another (optional) aspect of the exemplary method 200, in which the lifetime of primary resources is dynamically predicted (e.g., by lifetime predictor 106 of FIG. 1). In this portion of method 200, as primary resources are accessed and used, such access patterns are monitored and used to predict the remaining lifetime of the primary resources, as shown in block 220. At decision block 222, it is determined whether the predicted remaining lifetime of any of the primary resources has fallen below a defined threshold (Tcritical). If not, no action need be taken and this portion of the process returns to block 200 to continue to monitor access patterns and predict lifetime of primary resources.
  • However, if any of the primary resources have in fact fallen below the defined remaining lifetime threshold, the lifetime predictor 106 (FIG. 1) sends a request for secondary resources to the resource mapper 110 (FIG. 1) as shown in block 224. Then, as shown in decision block 226, it is determined (by the resource mapper) whether there are any available resources in the secondary resource pool to accommodate the request of block 224. In the event that all resources in the pool are busy and no other resources are preempted for the requests, the resource mapper sends a rejection to the requesters, along with a recommended retry time. The process then continues to monitor access patterns and predict resource lifetimes in block 220. Optionally, the mapper has a reservation table to keep record of rejected requests and, if any secondary resource becomes available, the mapper allocates it to one of the pending requests in the table. In addition to lifetime predictors, the resource operational mode controller is optionally able to request secondary resources to further extend lifetime of primary resources.
  • However, if there are available resources in the pool as reflected in decision block 226, the resource mapper allocates the available resources to the requested primary resources in block 228, updates a resource mapping table accordingly and notifies the resource operational mode controller to place the aging resource in a non-operational mode. In block 230, the resource operational mode controller configures the corresponding primary resource from the operational mode to the non-operational mode. In switching from the operational mode to the non-operational mode, the controller resource operational mode further ensures that the states of the primary resources are safely stored or migrated to the secondary resources mapped in the pool if needed, thus avoiding adverse effects on system integrity.
  • Although the system and method embodiments described above are discussed in general terms such as primary and secondary resources within an microprocessor system, such resources may represent any of a number of components associated with a microprocessor system including, but not limited to: static random access memory (SRAM) arrays, embedded dynamic random access memory (EDRAM) arrays, register files, execution units, and processor cores, for example.
  • Accordingly, FIG. 3 a schematic block diagram illustrating a specific example of an application of the system and method of FIGS. 1 and 2, as applied to L2 caches in multiple on-chip processor core systems. As shown in FIG. 3, the system 300 includes are eight on-chip L2 caches 302 (cache 0 through cache 7) connected through a common bus 304, wherein each cache 302 may be private with respect to one on-chip processor core or, alternatively, shared by more than one on-chip processor core. As shown in the exemplary embodiment of FIG. 3, each L2 cache 302 further includes two banks (Bank 0, Bank 1) and has 8-way set associativity.
  • The dynamic lifetime reliability extension techniques described above is capable of being applied to various granularities of L2 caches for the non-operational mode, such as to the entire cache, banks, arrays, columns and rows. In the exemplary L2 cache system 300 in FIG. 3, eight associative ways may each enter a non-operational mode independently with respect one another. Each associative way is implemented as one or more memory arrays 306. In addition, each L2 cache 302 has a resource operational mode controller 112 and a lifetime predictor 106 associated therewith. The secondary resource pool 308 includes a plurality of secondary arrays 310 and a resource mapper(s) 110. As described above, these secondary arrays 310 in the secondary resource pool 308 may be physically located together and/or distributed across the chip. The resource mapper(s) 110 is also centralized or distributed or both, along with secondary arrays.
  • In one specific operational example, it is assumed that the primary array(s) for way 1 of bank 0 in L2 cache 0 (also marked with an “X” in FIG. 3) is requested to be placed in the non-operational mode. The secondary resource mapper 110 checks to see whether there are any available secondary arrays 310 in the pool 308. If any are available, the mapper 110 allocates one of the available secondary arrays 310 to the requested primary array (in this example, to way 1 of bank 0 in L2 cache 0) and updates the resource mapping table in such a way so as to respond bus transactions associated with L2 cache 0 and the address range of the primary array. The selected secondary array enters the operational mode if it has been in the non-operational mode.
  • If no available secondary array 310 presently exists in the pool, the mapper 110 rejects the request with a retry time or adds the rejected request to the reservation table. The term “available” in this example refers to those secondary arrays that are either free and have a remaining lifetime longer than the threshold lifetime, or those secondary arrays that are already mapped to other primary arrays having a longer remaining lifetime than the array for way 1.
  • If the resource operational mode controller 112 receives a mapping confirmation from the secondary resource mapper 308, the controller 112 commences a drain process of cache lines in the array of way 1. However, if the secondary array request is rejected, the resource operational mode controller 112 retries after the recommended retry time. In the drain process, cache lines in the dirty or exclusive state are written back to main memory and L1 cache lines associated to the cache lines are invalidated. Alternatively, in the drain process, valid cache lines are migrated to the mapped secondary array, optionally through dedicated interconnects.
  • When the drain process is completed, the resource operational mode controller 112 configures the array of way 1 to enter the non-operational mode. In addition, the resource operational mode controller 112 configures the L2 cache controller (not specifically shown) in such a way so as to send accesses of cache lines in the array of way 1 to the secondary resource pool 310. Thus, when processor cores request reads hitting in the L2 cache, the cache controller sends the cache line to the processor cores. If reads hit in the secondary array, the controller issues a bus transaction which is responded to by the secondary array. When read requests miss in the L2 cache including the secondary array, the controller issues a cache miss request on the bus 304, which is in turn responded to by other L2 caches, L3 caches or memory.
  • When processor cores request writes hitting in the L2 cache, the cache controller updates the cache line. If writes hit in the secondary array, the cache controller sends the data to the secondary array for writes. When writes miss in the L2 cache including the secondary subarray, the controller issues a cache miss for writes on bus. When other L2 caches issue a cache miss for reads on the bus 304, the cache controller snoops the miss. If the controller has to provide data to the requester, it reads the cache and sends data on bus, or lets the secondary arrays provide data to the requester. When other L2 caches issue a cache miss for writes on the bus 304, the cache controller snoops the miss and invalidates the cache line if necessary. Since invalidation requires only a change of cache line state, the secondary array does nothing even if it has the cache line for invalidation.
  • The non-operational mode of the array of way 1 is terminated by the resource operational mode controller 112 in the event the cache miss rate or cache access time exceeds the threshold, or by the secondary resource mapper 110 in the event the non-operational time is up or the secondary array needs to be either allocated to another primary array or itself enter the non-operational mode.
  • The resource operational mode controller 112 configures the primary array of way 1 to enter the operational mode and begins a drain process of cache lines in the secondary array mapped to way 1. In the drain process, cache lines in the dirty or exclusive state are written back to main memory (or to the next level of the memory cache hierarchy) and the corresponding L1 cache lines are invalidated. Alternatively, in the drain process, valid cache lines are migrated to the primary array of way 1, optionally through dedicated interconnects. Finally, once the drain process is completed, the resource operational mode controller 112 notifies the secondary resource mapper 110 to un-map the secondary array. The unmapped secondary array either enters the non-operational mode or is allocated to another array that caused the termination or another array having the highest priority in the reservation table.
  • While the invention has been described with reference to a preferred embodiment or embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (5)

1. A method for implementing dynamic lifetime reliability extension for microprocessor architectures, the method comprising:
configuring a plurality of primary resources;
configuring a secondary resource pool having one or more secondary resources;
configuring a resource operational mode controller to selectively switch each of the primary and secondary resources between an operational mode and a non-operational mode, wherein the operational mode corresponds to performance of one or more tasks for which a given resource is designed to execute with respect to a microprocessor system and wherein the non-operational mode corresponds to a temporary lifetime extension process for at least one of suspending the aging of resources and extending the aging of resources;
configuring a resource mapper associated with the secondary resource pool and in communication with the resource operational mode controller to map a secondary resource placed into the operational mode to a corresponding primary resource placed into the non-operational mode, and to un-map a secondary resource from a corresponding primary resource in the event the corresponding primary resource is placed back into the operational mode; and
configuring a transaction decoder to receive incoming transaction requests and, responsive to the resource operational mode controller, direct the requests to one of a primary resource in the operational mode and a secondary resource in the operational mode, the secondary resource mapped to an associated primary resource placed in the non-operational mode.
2. The method of claim 1, further comprising:
configuring a lifetime predictor to monitor usage of the primary resources, wherein the lifetime predictor predicts a remaining lifetime thereof;
wherein, in the event that any of the primary resources are predicted as having a lifetime shorter than a defined threshold lifetime, the lifetime predictor alerts the resource operational mode controller and resource mapper so that an identified primary resource designated to be placed in the non-operational mode.
3. The method of claim 1, wherein the resource mapper is configured to:
find and allocate available secondary resources in the pool upon a request, and to map an allocated secondary resource to a corresponding one of the identified primary resources that is designated for lifetime extension treatment, such that decoded transactions by the decoder originally intended for the identified primary resource are thereafter executed by the allocated secondary resource;
wherein, in the event that there is more than one available secondary resource found in the pool, the resource mapper select a secondary resource based on one or more of: a round-robin order, a remaining lifetime order of the secondary resources themselves, and a maximum recovery based order; and
in the event there is no available secondary resource found, the resource mapper is further configured to implement one of: rejecting the request, and commandeering a secondary resource already mapped to another primary resource.
4. The method of claim 3, wherein the resource mapper is further configured to:
evaluate a priority of a new request with respect to lifetime extension treatment;
de-allocate a secondary resource mapped to a primary resource of lower priority in favor of another primary resource of higher priority;
notify the resource operational mode controller to interrupt the non-operational mode for the lower priority primary resource;
map the de-allocated secondary resource to the other primary resource of higher priority and notify the resource operational mode controller of the new request so as to result in the placement of the higher priority primary resource into the non-operational mode for lifetime extension treatment; and
in the event a lower priority primary resource with respect to the request is not found, the resource mapper is further configured to implement one or more of:
sending a rejection of lifetime extension treatment for the new request, and adding the request to a reservation table.
5. The method of claim 1, wherein the primary and secondary resources comprise one or more of: static random access memory (SRAM) arrays, embedded dynamic random access memory (EDRAM) arrays, register files, execution units, and processor cores.
US12/118,050 2008-01-04 2008-05-09 Method for implementing dynamic lifetime reliability extension for microprocessor architectures Abandoned US20090178051A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/118,050 US20090178051A1 (en) 2008-01-04 2008-05-09 Method for implementing dynamic lifetime reliability extension for microprocessor architectures

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/969,413 US7386851B1 (en) 2008-01-04 2008-01-04 System and method for implementing dynamic lifetime reliability extension for microprocessor architectures
US12/118,050 US20090178051A1 (en) 2008-01-04 2008-05-09 Method for implementing dynamic lifetime reliability extension for microprocessor architectures

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/969,413 Continuation US7386851B1 (en) 2008-01-04 2008-01-04 System and method for implementing dynamic lifetime reliability extension for microprocessor architectures

Publications (1)

Publication Number Publication Date
US20090178051A1 true US20090178051A1 (en) 2009-07-09

Family

ID=39484535

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/969,413 Active US7386851B1 (en) 2008-01-04 2008-01-04 System and method for implementing dynamic lifetime reliability extension for microprocessor architectures
US12/118,050 Abandoned US20090178051A1 (en) 2008-01-04 2008-05-09 Method for implementing dynamic lifetime reliability extension for microprocessor architectures

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/969,413 Active US7386851B1 (en) 2008-01-04 2008-01-04 System and method for implementing dynamic lifetime reliability extension for microprocessor architectures

Country Status (1)

Country Link
US (2) US7386851B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090031066A1 (en) * 2007-07-24 2009-01-29 Jyoti Kumar Bansal Capacity planning by transaction type
US20090199196A1 (en) * 2008-02-01 2009-08-06 Zahur Peracha Automatic baselining of resource consumption for transactions
US20090235268A1 (en) * 2008-03-17 2009-09-17 David Isaiah Seidman Capacity planning based on resource utilization as a function of workload

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2954979B1 (en) 2010-01-05 2012-06-01 Commissariat Energie Atomique METHOD FOR SELECTING A RESOURCE AMONG A PLURALITY OF PROCESSING RESOURCES, SO THAT PROBABLE TIMES BEFORE THE RESOURCE FAILURE THEN EVENTUALLY IDENTICAL
US8386859B2 (en) 2010-04-30 2013-02-26 International Business Machines Corporation On-chip non-volatile storage of a test-time profile for efficiency and performance control
US8276018B2 (en) 2010-04-30 2012-09-25 International Business Machines Corporation Non-volatile memory based reliability and availability mechanisms for a computing device
JP2013149108A (en) * 2012-01-19 2013-08-01 Canon Inc Information processing apparatus, control method therefor, and program
US9621425B2 (en) * 2013-03-27 2017-04-11 Telefonaktiebolaget L M Ericsson Method and system to allocate bandwidth for heterogeneous bandwidth request in cloud computing networks
US10963304B1 (en) * 2014-02-10 2021-03-30 Google Llc Omega resource model: returned-resources
US9721646B1 (en) * 2016-06-29 2017-08-01 International Business Machines Corporation Prevention of SRAM burn-in
US10949241B2 (en) * 2019-03-08 2021-03-16 Google Llc Cost-efficient high-availability multi-single-tenant services
US11775444B1 (en) * 2022-03-15 2023-10-03 International Business Machines Corporation Fetch request arbiter

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4590554A (en) * 1982-11-23 1986-05-20 Parallel Computers Systems, Inc. Backup fault tolerant computer system
US6088328A (en) * 1998-12-29 2000-07-11 Nortel Networks Corporation System and method for restoring failed communication services
US6308286B1 (en) * 1994-06-30 2001-10-23 Hughes Electronics Corporation Complexity reduction system and method for integrated redundancy switchover systems
US20030172329A1 (en) * 2002-03-08 2003-09-11 Davis James Andrew Allocation of sparing resources in a magnetoresistive solid-state storage device
US20030204759A1 (en) * 2002-04-26 2003-10-30 Singh Jitendra K. Managing system power
US6651082B1 (en) * 1998-08-03 2003-11-18 International Business Machines Corporation Method for dynamically changing load balance and computer
US6865591B1 (en) * 2000-06-30 2005-03-08 Intel Corporation Apparatus and method for building distributed fault-tolerant/high-availability computed applications
US6907607B1 (en) * 2000-10-17 2005-06-14 International Business Machines Corporation System and method for analyzing capacity in a plurality of processing systems
US6931567B2 (en) * 2000-01-31 2005-08-16 Hitachi, Ltd. Storage system
US7069558B1 (en) * 2000-03-09 2006-06-27 Sony Corporation System and method for interactively utilizing a user interface to manage device resources
US7100060B2 (en) * 2002-06-26 2006-08-29 Intel Corporation Techniques for utilization of asymmetric secondary processing resources
US20070033491A1 (en) * 2002-12-11 2007-02-08 Howlett Warren K Repair techniques for memory with multiple redundancy
US20070101203A1 (en) * 2005-10-31 2007-05-03 Pomaranski Ken G Method and apparatus for selecting a primary resource in a redundant subsystem
US20070271369A1 (en) * 2006-05-17 2007-11-22 Arkin Aydin Apparatus And Methods For Managing Communication System Resources

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7222245B2 (en) * 2002-04-26 2007-05-22 Hewlett-Packard Development Company, L.P. Managing system power based on utilization statistics
US7827262B2 (en) * 2005-07-14 2010-11-02 Cisco Technology, Inc. Approach for managing state information by a group of servers that services a group of clients

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4590554A (en) * 1982-11-23 1986-05-20 Parallel Computers Systems, Inc. Backup fault tolerant computer system
US6308286B1 (en) * 1994-06-30 2001-10-23 Hughes Electronics Corporation Complexity reduction system and method for integrated redundancy switchover systems
US6651082B1 (en) * 1998-08-03 2003-11-18 International Business Machines Corporation Method for dynamically changing load balance and computer
US6088328A (en) * 1998-12-29 2000-07-11 Nortel Networks Corporation System and method for restoring failed communication services
US6931567B2 (en) * 2000-01-31 2005-08-16 Hitachi, Ltd. Storage system
US7069558B1 (en) * 2000-03-09 2006-06-27 Sony Corporation System and method for interactively utilizing a user interface to manage device resources
US6865591B1 (en) * 2000-06-30 2005-03-08 Intel Corporation Apparatus and method for building distributed fault-tolerant/high-availability computed applications
US6907607B1 (en) * 2000-10-17 2005-06-14 International Business Machines Corporation System and method for analyzing capacity in a plurality of processing systems
US20030172329A1 (en) * 2002-03-08 2003-09-11 Davis James Andrew Allocation of sparing resources in a magnetoresistive solid-state storage device
US6973604B2 (en) * 2002-03-08 2005-12-06 Hewlett-Packard Development Company, L.P. Allocation of sparing resources in a magnetoresistive solid-state storage device
US20030204759A1 (en) * 2002-04-26 2003-10-30 Singh Jitendra K. Managing system power
US6996728B2 (en) * 2002-04-26 2006-02-07 Hewlett-Packard Development Company, L.P. Managing power consumption based on utilization statistics
US7100060B2 (en) * 2002-06-26 2006-08-29 Intel Corporation Techniques for utilization of asymmetric secondary processing resources
US20070033491A1 (en) * 2002-12-11 2007-02-08 Howlett Warren K Repair techniques for memory with multiple redundancy
US20070101203A1 (en) * 2005-10-31 2007-05-03 Pomaranski Ken G Method and apparatus for selecting a primary resource in a redundant subsystem
US20070271369A1 (en) * 2006-05-17 2007-11-22 Arkin Aydin Apparatus And Methods For Managing Communication System Resources

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090031066A1 (en) * 2007-07-24 2009-01-29 Jyoti Kumar Bansal Capacity planning by transaction type
US8631401B2 (en) 2007-07-24 2014-01-14 Ca, Inc. Capacity planning by transaction type
US20090199196A1 (en) * 2008-02-01 2009-08-06 Zahur Peracha Automatic baselining of resource consumption for transactions
US8261278B2 (en) * 2008-02-01 2012-09-04 Ca, Inc. Automatic baselining of resource consumption for transactions
US20090235268A1 (en) * 2008-03-17 2009-09-17 David Isaiah Seidman Capacity planning based on resource utilization as a function of workload
US8402468B2 (en) 2008-03-17 2013-03-19 Ca, Inc. Capacity planning based on resource utilization as a function of workload

Also Published As

Publication number Publication date
US7386851B1 (en) 2008-06-10

Similar Documents

Publication Publication Date Title
US7386851B1 (en) System and method for implementing dynamic lifetime reliability extension for microprocessor architectures
US8046566B2 (en) Method to reduce power consumption of a register file with multi SMT support
US8108629B2 (en) Method and computer for reducing power consumption of a memory
US10318365B2 (en) Selective error correcting code and memory access granularity switching
US7194641B2 (en) Method and apparatus for managing power and thermal alerts transparently to an operating system in a data processing system with increased granularity in reducing power usage and thermal generation
US9575889B2 (en) Memory server
US6725336B2 (en) Dynamically allocated cache memory for a multi-processor unit
US7865895B2 (en) Heuristic based affinity dispatching for shared processor partition dispatching
CN100338555C (en) Method and apparatus for reducing power consumption in a logically partitioned data processing system
US7870551B2 (en) Optimization of thread wake up for shared processor partitions
US7925840B2 (en) Data processing apparatus and method for managing snoop operations
US8464023B2 (en) Application run-time memory optimizer
US8949659B2 (en) Scheduling workloads based on detected hardware errors
US20060085794A1 (en) Information processing system, information processing method, and program
US20080189487A1 (en) Control of cache transactions
US20070079063A1 (en) Method of saving power consumed by a storage system
US9355035B2 (en) Dynamic write priority based on virtual write queue high water mark for set associative cache using cache cleaner when modified sets exceed threshold
US20130036270A1 (en) Data processing apparatus and method for powering down a cache
WO2006117950A1 (en) Power controller in information processor
US20200103956A1 (en) Hybrid low power architecture for cpu private caches
EP3588313B1 (en) Non-volatile memory aware caching policies
US9568986B2 (en) System-wide power conservation using memory cache
US20090177919A1 (en) Dynamic redundancy for microprocessor components and circuits placed in nonoperational modes
US11841798B2 (en) Selective allocation of memory storage elements for operation according to a selected one of multiple cache functions
KR100656353B1 (en) Method for reducing memory power consumption

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE