WO1992008193A1 - A fault tolerant data storage system - Google Patents

A fault tolerant data storage system

Info

Publication number
WO1992008193A1
WO1992008193A1 PCT/GB1991/001929 GB9101929W WO9208193A1 WO 1992008193 A1 WO1992008193 A1 WO 1992008193A1 GB 9101929 W GB9101929 W GB 9101929W WO 9208193 A1 WO9208193 A1 WO 9208193A1
Authority
WO
WIPO (PCT)
Prior art keywords
chips
row
array
spare
chip
Prior art date
Application number
PCT/GB1991/001929
Other languages
French (fr)
Inventor
Neal Hugh Macdonald
Original Assignee
Mv Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mv Limited filed Critical Mv Limited
Priority to US08/050,155 priority Critical patent/US5742613A/en
Priority to EP91919093A priority patent/EP0555307B1/en
Priority to JP3517288A priority patent/JPH06502263A/en
Priority to DE69125724T priority patent/DE69125724T2/en
Publication of WO1992008193A1 publication Critical patent/WO1992008193A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/70Masking faults in memories by using spares or by reconfiguring
    • G11C29/76Masking faults in memories by using spares or by reconfiguring using address translation or modifications

Definitions

  • This invention relates to a random access data storage system which comprises a plurality of elements, typically integrated circuits or semiconductor chips, each such element comprising an array of memory locations some of which may be faulty.
  • the majority memory chip can take many forms, typically Dynamic Random Access Memory (DRAM) , Static Random Access Memory (SRAM) , and Programmable Read Only Memory (PROM) . Despite some of their names these are all random access memories (RAMs) . Such memory chips are arranged as X bits wide by Y address locations deep. A majority RAM contains some X bits that cannot be read from or written to at some Y addresses.
  • DRAM Dynamic Random Access Memory
  • SRAM Static Random Access Memory
  • PROM Programmable Read Only Memory
  • a fault tolerant random access data storage system which comprises a plurality of main elements, each element comprising an array of memory locations, a first spare element and a second spare element, each spare element comprising an array of memory locations, means for addressing the elements with the logical addresses of the rows within the arrays being skewed relative to their physical addresses but in a different manner for the different elements, and with the logical addresses of the columns within the arrays being skewed relative to their physical addresses but in a different manner for the different elements, and means for recording faulty memory locations so that if a selected row in a selected main element includes a fault, then a replacement row in the first spare element is selected instead, and if a selected column in a selected main element includes a fault, then a replacement column in the second spare element is selected instead.
  • the main and spare memory elements may comprise individual integrated circuits (or chips) , or some or all of the elements may be combined on a single chip.
  • each of the memory elements comprises a row of two chips, each chip being typically 4 or 8 bits wide and Y addresses deep.
  • the overhead in terms of spare chips
  • the present invention requires a fixed number of spare chips independent of the number of chips in the array. Even for the two embodiments of PCT/GB90/01051 there is a significant cost saving over arrays constructed from perfect chips since majority RAMs are available at a significant discount. However it is always desirable to keep the component count low to maximise packing density and reliability, and to minimise power dissipation.
  • the present invention will achieve higher packing density and reliability and lower power dissipation than the embodiments of PCT/GB90/010151 owing to the greatly reduced numbers of spare chips.
  • Systems of the present invention will also demonstrate shorter access times than the systems of PCT/GB90/01051.
  • a typical embodiment of the present invention uses an array, comprising many rows of chips, where each row is 2 chips wide (typically each chip is defined as 8 bits wide by Y addresses deep and where Y is split into chip row address (CRA) and chip column addresses (CCA) .
  • CRA chip row address
  • CCA chip column addresses
  • spare chips are required. Each of these spare chips can be a majority RAM.
  • Two chips, known as the spare column chips (SC) provide spares for chips containing faulty CCAs and two chips, known as the spare row chips (SR) , provide spares for chips containing faulty CRAs.
  • SC spare column chips
  • SR spare row chips
  • a spare column chip with a faulty CRA is provided with spares in the spare row chip whilst a spare row chip with a faulty CCA is supplied with spares in the spare column chip.
  • a non-volatile look-up table, or map (such as a Programmable Read Only Memory) defining the locations of defects identifies the chip containing the defect and data is read from, or written into, the spare column chips.
  • a faulty CRA is handled in the same way except that data is read from, or written to, the spare row chips.
  • Both SC* and SR can contain both faulty CCAs and CRAs by virtue of the technique described in PCT/GB90/01051 which is used to avoid the situation when two or more chips from different array rows exhibit a fault at the same chip address (known as a coincidental fault) .
  • the embodiment described herein uses two maps to determine if a particular CCA or CRA is faulty. These maps are programmed either in the factory prior to shipping the storage system or as a consequence of operational failure. In either case the location of faults has been detected by appropriate tests or diagnostics. These faults are classified as CCA or CRA locations. A computer program executes an algorithm to determine if there are any coincidental faults within the CCA or CRA data. In the event that coincidental faults appear then the map data is prepared so as to avoid these coincidences and information is created to skew the addressing to each chip. The skew information, or skew values, are used by the control logic of the embodiment described herein and is stored in registers within that control logic.
  • FIGURE 1 is a block diagram of a typical computer system containing a RAM sub-system
  • FIGURES 2 and 3 each show a row of majority memory chips in a memory array in order to explain the principle of skewing physical addresses to avoid coincidental faults between memory chips;
  • FIGURE 4 is a block diagram of an embodiment of fault tolerant data storage system in accordance with this invention
  • FIGURE 5 is a block diagram of a memory array controller (MAC) of the fault tolerant data storage system
  • FIGURE 6 shows a typical format for a dynamic column sparing map (DCSM) or dynamic row sparing map (DRSM) of the syste ;
  • DCSM dynamic column sparing map
  • DRSM dynamic row sparing map
  • FIGURE 7 is a block diagram of an address driver (AD) circuit of the memory array controller
  • FIGURE 8 is a flow diagram to illustrate a manufacturing process used to determine the contents of the dynamic sparing maps.
  • FIGURE 9 is a flow diagram to illustrate a process to respond to operational failure within any majority memory chip within the system.
  • FIGURE 1 illustrates a typical computer system with a microprocessor (MPU) 1 connected to a read only memory (ROM) 5 and a random access memory (RAM) 4, through a bidirectional system data (SD) bus 2 and a system address (SA) bus 3.
  • the SA bus 3 is split into three effectively separate address busses within the RAM 4. These are for the array row address (ARA) , the chip column address (CCA) and the chip row address (CRA) . Control signals and peripheral circuits have been omitted from Figure 1 in the interests of clarity.
  • ARA defines which one of a plurality of rows of chips in an array is to be accessed.
  • CCA defines the column location to be addressed in the chips selected by ARA.
  • CRA defines the row location to be addressed within the chips selected by ARA.
  • Figures 2 and 3 illustrate the principles of differentially skewing the physical and logical addresses of a group of chips.
  • Figure 2 shows a single row of four majority memory chips 11, plus a spare chip 12: in this case, each chip contains a fault 10 at the same physical address, which in this case is a chip column address, though the same would apply to a chip row address.
  • the chip columns are addressed in parallel but the chips are enabled one-at-a-time. If physical column 0 is addressed when the first chip is enabled, it is of no use to enable the spare chip to use physical column O in the spare chip as a replacement column, because this column in the spare chip is also faulty.
  • the physical addresses are differentially skewed so that a given logical address selects different physical columns in the different chips. The skewing is arranged so that for any given logical address, no more than one chip will have a fault in the columns selected.
  • the spare column chip 12 can provide a spare or replacement column for each of the faulty chips.
  • the skewing arrangements provide tolerance for coincidental faults, i.e. faults in the same physical columns of two or more chips. The same principles apply in respect of rows.
  • FIG. 4 is a block diagram of the embodiment of data storage system of the present invention.
  • the RAM array in this example comprises 32 rows, each 2 chips wide. All the chips in the array are majority rams (MR) 31. In the interest of clarity only the first and last rows of the array are shown.
  • the system address (SA) bus 20 provides all address information to the memory address controller (MAC) 25.
  • the MAC 25 drives a separate chip column address (CCA) bus 26 and chip row address (CRA) bus 27.
  • the CCA and CRA are logically skewed within MAC 25 to provide tolerance of coincidental faults.
  • Each array row is separately enabled by thirty two individual decode lines (DECL) 28.
  • DELCO is connected to the chip enable terminals of all chips MR in array row O, DECL1 to array row
  • the array has two extra rows of chips MR, the spare row (SR) 32 and spare column (SC) 33.
  • the chip enable terminals of chips SR 32 are connected to Enable Spare Row Line
  • ENSRL System Data Upper bus
  • SDL System Data Lower
  • Individual byte pairs are enabled by selecting one array row from thirty two array rows by the assertions of one of the DECL lines. Asserting one of the two direction control lines, the Read (RDL) line 23 or the Write (WTL) line 24 will allow a selected word to be read or written respectively over the SDL and SDU data lines.
  • RDL Read
  • WTL Write
  • FIG. 5 illustrates the MAC 25.
  • the SA bus 20 is split into three buses, ARA 40, Logical Chip Column Address
  • PCCA Physical Chip Row Address
  • PCA Logical Chip Row Address
  • ARD array row decoder
  • CAD 46 produces the skewed chip column address for the memory array on bus CCA 52.
  • RAD 46 produces the skewed chip row address for the memory array on bus CRA 53.
  • Each address driver 46 or 47 receives a tag bit, Column
  • CT Charge Tag
  • RT Row Tag
  • ARA selects a range of N locations which tag individual faulty addresses in
  • MR MR-RNTI
  • CCA and CRA contain 10 lines each.
  • a map PROM consists of 32 x IK locations. Each map location comprises two bits, the Tag bit 60 and a Spare Tag bit 61.
  • the tag bits from DCSM and DRSM combine to create the following truth table:
  • Figure 5 shows the additional tag bits to identify address faults within the SR and SC. These are known as SRT 50 and SCT 51.
  • One of three enable signals are asserted as a consequence of executing the truth table of Table 1 and are defined as follows; ENSCL 54 enables SC, ENSRL 55 enables SR and ENDECL 56 enables the ARD 43 if both ENSCL and ENSRL are negated (in which case the appropriate DECL line is asserted by the ARD 43) .
  • Figure 7 shows the internal circuit of an Address
  • the same circuit can be used as a CAD or RAD.
  • the skewing mechanism employs a full ADDER 80 to produce the sum of the logical or base address (BA) 81 and the contents of one of thirty two registers from a Register File 83.
  • a specific register for each chip row is selected by the ARA bus 82 via a decoder (D) 87.
  • the skewed address is the output of the adder, KA 84.
  • the registers are non-volatile registers (programmed at the same as DCSM and DSRM) or they are programmed every time the system is powered up.
  • the write path for the Register File 83 is omitted in the interests of clarity however many examples of Register File circuits are known to those skilled in the art.
  • a skew value table is contained in the DCSM and DRSM.
  • Typical map PROMs are 8 bits wide where two bits are used for tagging, leaving typically five bits for each half of thirty three 10 bit skew values.
  • the skew values are typically packed in five bit entities (the upper and lower half of each ten bit value) into an appropriate area of a map. These values can be unpacked by reading the map PROMs.
  • each of the registers contain a skew value determined by an appropriate algorithm to avoid all coincidental faults over a range of 32 array rows.
  • Many algorithms can be developed for generating skew values. All routines start with a map of faults for each MR in the array.
  • Figure 7 shows an additional register (RS) 85 used to store the skew value for SR or SC depending on the designation of the Address Driver.
  • the SR or SC is selected by ENSCL or ENSRL respectively. Accordingly subject to the conditions defined by Table 1 then one of thirty three registers is selected to provide the A input to the ADDER, thus all coincidental CCA and CRA faults can be tolerated.
  • the truth table of Table 1 is executed by the function (F) block 86. Both CAD and RAD can be implemented from the same circuit and only one Address Driver has valid terms to the function block as shown in Figure 5.
  • the access time of the embodiment is composed of the access time of the MR in the array and the access time of a map PROM. This is so since the chip enable terminals of the MRs are asserted after the CAD resolves which one of thirty four chip enables to select (32 DECL lines plus ENSRL and ENSCL) . It would be beneficial to use the cheapest form of PROM for the maps and this implies the slowest form of PROM. However this will increase the access time of the storage system. However if two further ARDs are used in the system then individual array rows can be preselected. The original ARD asserts one of thirty two (DECL) lines which select the individual chip enable lines of each row of the array as before. This ARD is known as the Chip Enable ARD (CARD) .
  • CARD Chip Enable ARD
  • the second ARD known as the Output Enable ARD (OARD) asserts one of thirty two (ODECL) lines which select individual output enable lines of all MRs in an array row (instead of a common connection to RDL as above) .
  • the third ARD known as the Write Enable ARD (WARD) asserts one of thirty two (WDECL) lines which select individual write enable lines of all MRS in an array row (instead of a common connection WTL as above) . All ARD outputs are selected by the ARA bus.
  • OARD the decoder is enabled by RDL
  • WARD the decoder is enabled by WTL.
  • the spare rows SR and SC the output enable lines (ENOSRL and ENOSCL) and write enable lines (ENWSRL and ENSWCL) are gated with RTL and WTL respectively.
  • the additional output enable and write enable signals allow three array rows to be enabled simultaneously, that is one DECL signal, ENSRL and ENSCL are all asserted together. No output enable or write enable signal is asserted until the function unit in the MAC has resolved if there is to be any sparing and if so which of SR or SC is to be asserted. At this time only one of thirty two DECL lines (from CARD) or ENSRL or ENSCL is asserted. Then depending upon the type of operation being performed (read or write) one of thirty two ODECL or ENOSRL or ENOSCL, or one of thirty two WDECL or ENWSRL or ENWSCL is asserted substantially later than chip enable. Accordingly the access time of the map can be hidden in the delay between chip enable and output enable (or write enable) assertion.
  • Figures 8 and 9 illustrate two typical processes used to define the contents of the map PROMS, DCSM and DSRM.
  • Figure 8 shows a typical process to manage faults arising from MR manufacture in the factory.
  • Computer readable labels are attached to each MR. Each label would be written with a unique code typically in bar-code format or optical character recognition (OCR) format. Unique codes could simply comprise sequential numbers.
  • OCR optical character recognition
  • Unique codes could simply comprise sequential numbers.
  • Such a label gives each MR a unique identity which is used to create an entry within a Fault Data File (FDF) .
  • the MR is tested using appropriate equipment and electrical and environmental conditions. If faults are detected within the MR as a consequence of this testing, then such faults are diagnosed as CCA and/or CRA faults and stored in the FDF within the space indexed by the MR identification number N.
  • MRs can be re-tested many times and CCA and/or CRA data appended to the entry for that chip within FDF.
  • MRs are then released to an assembly process and are attached at random to suitable substrates such as a printed circuit board (PCB) .
  • PCB printed circuit board
  • all MR identities on the PCB are read.
  • a list of identity numbers is created, cross-referencing numerous values of N with the position of MRs on the PCB.
  • This cross-referenced list is used to access the FDF to create a sub-set of the FDF for all MRs on a particular PCB.
  • the anti-coincidence computer program is then executed using the FDF subset as its input data.
  • the program generates the appropriate output data in a form similar to that shown in Figure 6. This output data is used to program DCSM, DRSM and to pack the skew value table into these maps.
  • Figure 9 shows the process for in-situ testing of MRs.

Abstract

A fault tolerant random access data storage system comprises a plurality of rows of memory chips (31) plus a first spare row of chips (32) and a second spare row of chips (33), each chip comprising an array of memory locations. A controller (25) addresses the chips with the logical addresses of the rows within the arrays being skewed relative to their physical addresses but in a different manner for the different rows of chips, and with the logical addresses of the columns within the arrays being skewed relative to their physical addresses but in a different manner for the different rows of chips. The locations of faults within the chips are recorded so that if a selected array row in a selected chip row (31) is faulty, then a replacement row in the first spare row of chips (32) is selected instead, and if a selected array column in a selected chip row (31) is faulty, then a replacement column in the second spare row of chips (33) is selected instead.

Description

A Fault Tolerant Data Storage System
This invention relates to a random access data storage system which comprises a plurality of elements, typically integrated circuits or semiconductor chips, each such element comprising an array of memory locations some of which may be faulty.
All memory chips suffer from defects or faults caused by their manufacturing process. Most of these faults are benign in that they do not impair the majority of the memory locations on the chip. Techniques have been developed that repair the defective locations by providing spare locations on the same chip, making the chip appear perfect. Such a chip is called a perfect chip, whereas a chip that contains a small number of faults, but otherwise operates with the same electrical or reliability characteristics as a perfect chip, is called a majority memory chip. Various techniques for tolerating faults within chips are discussed in the prior art introduction of our copending PCT patent application PCT/GB90/01051.
The majority memory chip can take many forms, typically Dynamic Random Access Memory (DRAM) , Static Random Access Memory (SRAM) , and Programmable Read Only Memory (PROM) . Despite some of their names these are all random access memories (RAMs) . Such memory chips are arranged as X bits wide by Y address locations deep. A majority RAM contains some X bits that cannot be read from or written to at some Y addresses.
Our copending PCT patent application PCT/GB90/01051 describes two typical embodiments of a fault tolerant data storage system that can retrieve data in either blocks of multiple bits or single bits. The two embodiments are applicable to any size or shape of array of memory chips. Furthermore any type of majority RAM can be used in the array. However the two embodiments are at their most optimum with a wide array of chips where each majority RAM is defined as a 1 bit by Y address memory. For example an array of 64 chips organised as 4 rows of 16 chips each would require 21 spare chips as envisaged in the second embodiment of PCT/GB90/01051. Using that architecture for an array of 32 rows of 2 chips each would require 35 spare chips.
In accordance with this invention there is provided a fault tolerant random access data storage system which comprises a plurality of main elements, each element comprising an array of memory locations, a first spare element and a second spare element, each spare element comprising an array of memory locations, means for addressing the elements with the logical addresses of the rows within the arrays being skewed relative to their physical addresses but in a different manner for the different elements, and with the logical addresses of the columns within the arrays being skewed relative to their physical addresses but in a different manner for the different elements, and means for recording faulty memory locations so that if a selected row in a selected main element includes a fault, then a replacement row in the first spare element is selected instead, and if a selected column in a selected main element includes a fault, then a replacement column in the second spare element is selected instead.
The main and spare memory elements may comprise individual integrated circuits (or chips) , or some or all of the elements may be combined on a single chip.
In an embodiment of the present invention to be described herein, each of the memory elements comprises a row of two chips, each chip being typically 4 or 8 bits wide and Y addresses deep. With each row consisting of two chips, the overhead (in terms of spare chips) comprises only four spare chips. Also, in contrast to the system of PCT/GB90/01051 which requires additional chips for each new row added to the array of chips, the present invention requires a fixed number of spare chips independent of the number of chips in the array. Even for the two embodiments of PCT/GB90/01051 there is a significant cost saving over arrays constructed from perfect chips since majority RAMs are available at a significant discount. However it is always desirable to keep the component count low to maximise packing density and reliability, and to minimise power dissipation. Accordingly the present invention will achieve higher packing density and reliability and lower power dissipation than the embodiments of PCT/GB90/010151 owing to the greatly reduced numbers of spare chips. Systems of the present invention will also demonstrate shorter access times than the systems of PCT/GB90/01051.
In this invention column faults and row faults can be tolerated by independent, though similar, means. A typical embodiment of the present invention uses an array, comprising many rows of chips, where each row is 2 chips wide (typically each chip is defined as 8 bits wide by Y addresses deep and where Y is split into chip row address (CRA) and chip column addresses (CCA) . Four additional, or spare, chips are required. Each of these spare chips can be a majority RAM. Two chips, known as the spare column chips (SC) , provide spares for chips containing faulty CCAs and two chips, known as the spare row chips (SR) , provide spares for chips containing faulty CRAs. A spare column chip with a faulty CRA is provided with spares in the spare row chip whilst a spare row chip with a faulty CCA is supplied with spares in the spare column chip.
If a faulty CCA is addressed, a non-volatile look-up table, or map, (such as a Programmable Read Only Memory) defining the locations of defects identifies the chip containing the defect and data is read from, or written into, the spare column chips. A faulty CRA is handled in the same way except that data is read from, or written to, the spare row chips. Both SC* and SR can contain both faulty CCAs and CRAs by virtue of the technique described in PCT/GB90/01051 which is used to avoid the situation when two or more chips from different array rows exhibit a fault at the same chip address (known as a coincidental fault) .
The embodiment described herein uses two maps to determine if a particular CCA or CRA is faulty. These maps are programmed either in the factory prior to shipping the storage system or as a consequence of operational failure. In either case the location of faults has been detected by appropriate tests or diagnostics. These faults are classified as CCA or CRA locations. A computer program executes an algorithm to determine if there are any coincidental faults within the CCA or CRA data. In the event that coincidental faults appear then the map data is prepared so as to avoid these coincidences and information is created to skew the addressing to each chip. The skew information, or skew values, are used by the control logic of the embodiment described herein and is stored in registers within that control logic.
Said embodiment of the present invention will now be described by way of example only and with reference to the accompanying drawings, in which:
FIGURE 1 is a block diagram of a typical computer system containing a RAM sub-system;
FIGURES 2 and 3 each show a row of majority memory chips in a memory array in order to explain the principle of skewing physical addresses to avoid coincidental faults between memory chips;
FIGURE 4 is a block diagram of an embodiment of fault tolerant data storage system in accordance with this invention; FIGURE 5 is a block diagram of a memory array controller (MAC) of the fault tolerant data storage system;
FIGURE 6 shows a typical format for a dynamic column sparing map (DCSM) or dynamic row sparing map (DRSM) of the syste ;
FIGURE 7 is a block diagram of an address driver (AD) circuit of the memory array controller;
FIGURE 8 is a flow diagram to illustrate a manufacturing process used to determine the contents of the dynamic sparing maps; and
FIGURE 9 is a flow diagram to illustrate a process to respond to operational failure within any majority memory chip within the system.
FIGURE 1 illustrates a typical computer system with a microprocessor (MPU) 1 connected to a read only memory (ROM) 5 and a random access memory (RAM) 4, through a bidirectional system data (SD) bus 2 and a system address (SA) bus 3. In the embodiment of the present invention to be described, the SA bus 3 is split into three effectively separate address busses within the RAM 4. These are for the array row address (ARA) , the chip column address (CCA) and the chip row address (CRA) . Control signals and peripheral circuits have been omitted from Figure 1 in the interests of clarity. ARA defines which one of a plurality of rows of chips in an array is to be accessed. CCA defines the column location to be addressed in the chips selected by ARA. CRA defines the row location to be addressed within the chips selected by ARA.
Figures 2 and 3 illustrate the principles of differentially skewing the physical and logical addresses of a group of chips. Figure 2 shows a single row of four majority memory chips 11, plus a spare chip 12: in this case, each chip contains a fault 10 at the same physical address, which in this case is a chip column address, though the same would apply to a chip row address. The chip columns are addressed in parallel but the chips are enabled one-at-a-time. If physical column 0 is addressed when the first chip is enabled, it is of no use to enable the spare chip to use physical column O in the spare chip as a replacement column, because this column in the spare chip is also faulty. Even if physical column O in the spare chip was good so that the faulty physical column 0 of the first chip could be replaced by physical column 0 of the spare chip, the faulty column O of the second chip (when this chip is addressed) could not be replaced by enabling the spare chip, because physical column O of the spare chip is already used as the spare for column O of the first chip. By contrast in Figure 3, the physical addresses are differentially skewed so that a given logical address selects different physical columns in the different chips. The skewing is arranged so that for any given logical address, no more than one chip will have a fault in the columns selected. Thus, when any chip is enabled, and when its faulty column (if any) is addressed, a good column (of corresponding logical address) is found in the spare chip as a replacement, which is not used as a replacement for the faulty columns of any of the other chips. Accordingly the spare column chip 12 can provide a spare or replacement column for each of the faulty chips. In other words, the skewing arrangements provide tolerance for coincidental faults, i.e. faults in the same physical columns of two or more chips. The same principles apply in respect of rows.
Figure 4 is a block diagram of the embodiment of data storage system of the present invention. The RAM array in this example comprises 32 rows, each 2 chips wide. All the chips in the array are majority rams (MR) 31. In the interest of clarity only the first and last rows of the array are shown. The system address (SA) bus 20 provides all address information to the memory address controller (MAC) 25. The MAC 25 drives a separate chip column address (CCA) bus 26 and chip row address (CRA) bus 27. The CCA and CRA are logically skewed within MAC 25 to provide tolerance of coincidental faults. Each array row is separately enabled by thirty two individual decode lines (DECL) 28. DELCO is connected to the chip enable terminals of all chips MR in array row O, DECL1 to array row
1 and so on. The array has two extra rows of chips MR, the spare row (SR) 32 and spare column (SC) 33. The chip enable terminals of chips SR 32 are connected to Enable Spare Row Line
(ENSRL) 29. The chip enable terminals of chips SC 33 are connected to Enable Spare Column Line (ENSCL) 30. Each column of each chip MR is typically 8 bits wide creating a combined two byte parallel data bus comprising System Data Upper bus (SDU) 22 and System Data Lower (SDL) 21.
Individual byte pairs (known as a word) are enabled by selecting one array row from thirty two array rows by the assertions of one of the DECL lines. Asserting one of the two direction control lines, the Read (RDL) line 23 or the Write (WTL) line 24 will allow a selected word to be read or written respectively over the SDL and SDU data lines.
Figure 5 illustrates the MAC 25. The SA bus 20 is split into three buses, ARA 40, Logical Chip Column Address
(PCCA) 41, and Logical Chip Row Address (PCRA) 42. The ARA bus controls the array row decoder (ARD) 43 producing thirty two unique DECL lines. The ARA bus is also connected to the DCSM
44, DRSM 45, column address driver (CAD) 46 and row address driver (RAD) 47. CAD 46 produces the skewed chip column address for the memory array on bus CCA 52. RAD 46 produces the skewed chip row address for the memory array on bus CRA 53.
Each address driver 46 or 47 receives a tag bit, Column
Tag (CT) 48 or Row Tag (RT) 49, from their respective DCSM 44 or DRSM 45. These tag bits indicate if a CCA or CRA is faulty.
A typical format for DCSM is shown in Fig. 6. ARA selects a range of N locations which tag individual faulty addresses in
MR. For example if each MR consists of 1M addresses then CCA and CRA contain 10 lines each. Accordingly a map PROM consists of 32 x IK locations. Each map location comprises two bits, the Tag bit 60 and a Spare Tag bit 61. The tag bits from DCSM and DRSM combine to create the following truth table:
TABLE 1
CT RT SCT SRT Enable Note 0 0 X X DECLn One of 32 array rows
1 0 0 X SC CCA fault only select SC
1 0 1 X SR CRA fault in SC, select SR
0 1 X 0 SR CRA fault only select SR
0 1 X 1 SC CCA fault in SR, select SC 1 1 0 X SC CCA/CRA fault, select SC
1 1 1 X SR CRA fault in SC, select SR
Figure 5 shows the additional tag bits to identify address faults within the SR and SC. These are known as SRT 50 and SCT 51. One of three enable signals are asserted as a consequence of executing the truth table of Table 1 and are defined as follows; ENSCL 54 enables SC, ENSRL 55 enables SR and ENDECL 56 enables the ARD 43 if both ENSCL and ENSRL are negated (in which case the appropriate DECL line is asserted by the ARD 43) . Figure 7 shows the internal circuit of an Address
Driver. The same circuit can be used as a CAD or RAD. The skewing mechanism employs a full ADDER 80 to produce the sum of the logical or base address (BA) 81 and the contents of one of thirty two registers from a Register File 83. A specific register for each chip row is selected by the ARA bus 82 via a decoder (D) 87. The skewed address is the output of the adder, KA 84. The registers are non-volatile registers (programmed at the same as DCSM and DSRM) or they are programmed every time the system is powered up. The write path for the Register File 83 is omitted in the interests of clarity however many examples of Register File circuits are known to those skilled in the art. In the case of volatile registers a skew value table is contained in the DCSM and DRSM. Typical map PROMs are 8 bits wide where two bits are used for tagging, leaving typically five bits for each half of thirty three 10 bit skew values. The skew values are typically packed in five bit entities (the upper and lower half of each ten bit value) into an appropriate area of a map. These values can be unpacked by reading the map PROMs.
After programming, each of the registers contain a skew value determined by an appropriate algorithm to avoid all coincidental faults over a range of 32 array rows. Many algorithms can be developed for generating skew values. All routines start with a map of faults for each MR in the array.
These maps have been generated by testing individual MRs with appropriate test hardware and stimulation. The simplest routines simply add a number to the first location of any fault and then re-examine the chip maps to see if the coincident fault has been avoided. If a coincidence still remains the same location is incremented again and the fault maps tested again, and so on until the incremented value exceeds the number of locations possible.
Figure 7 shows an additional register (RS) 85 used to store the skew value for SR or SC depending on the designation of the Address Driver. The SR or SC is selected by ENSCL or ENSRL respectively. Accordingly subject to the conditions defined by Table 1 then one of thirty three registers is selected to provide the A input to the ADDER, thus all coincidental CCA and CRA faults can be tolerated. The truth table of Table 1 is executed by the function (F) block 86. Both CAD and RAD can be implemented from the same circuit and only one Address Driver has valid terms to the function block as shown in Figure 5.
The access time of the embodiment is composed of the access time of the MR in the array and the access time of a map PROM. This is so since the chip enable terminals of the MRs are asserted after the CAD resolves which one of thirty four chip enables to select (32 DECL lines plus ENSRL and ENSCL) . It would be beneficial to use the cheapest form of PROM for the maps and this implies the slowest form of PROM. However this will increase the access time of the storage system. However if two further ARDs are used in the system then individual array rows can be preselected. The original ARD asserts one of thirty two (DECL) lines which select the individual chip enable lines of each row of the array as before. This ARD is known as the Chip Enable ARD (CARD) . The second ARD known as the Output Enable ARD (OARD) asserts one of thirty two (ODECL) lines which select individual output enable lines of all MRs in an array row (instead of a common connection to RDL as above) . The third ARD known as the Write Enable ARD (WARD) asserts one of thirty two (WDECL) lines which select individual write enable lines of all MRS in an array row (instead of a common connection WTL as above) . All ARD outputs are selected by the ARA bus. In the case of OARD the decoder is enabled by RDL, in the case of WARD the decoder is enabled by WTL. In the case of the spare rows, SR and SC the output enable lines (ENOSRL and ENOSCL) and write enable lines (ENWSRL and ENSWCL) are gated with RTL and WTL respectively.
The additional output enable and write enable signals allow three array rows to be enabled simultaneously, that is one DECL signal, ENSRL and ENSCL are all asserted together. No output enable or write enable signal is asserted until the function unit in the MAC has resolved if there is to be any sparing and if so which of SR or SC is to be asserted. At this time only one of thirty two DECL lines (from CARD) or ENSRL or ENSCL is asserted. Then depending upon the type of operation being performed (read or write) one of thirty two ODECL or ENOSRL or ENOSCL, or one of thirty two WDECL or ENWSRL or ENWSCL is asserted substantially later than chip enable. Accordingly the access time of the map can be hidden in the delay between chip enable and output enable (or write enable) assertion.
Figures 8 and 9 illustrate two typical processes used to define the contents of the map PROMS, DCSM and DSRM. Figure 8 shows a typical process to manage faults arising from MR manufacture in the factory. Computer readable labels are attached to each MR. Each label would be written with a unique code typically in bar-code format or optical character recognition (OCR) format. Unique codes could simply comprise sequential numbers. Such a label gives each MR a unique identity which is used to create an entry within a Fault Data File (FDF) . The MR is tested using appropriate equipment and electrical and environmental conditions. If faults are detected within the MR as a consequence of this testing, then such faults are diagnosed as CCA and/or CRA faults and stored in the FDF within the space indexed by the MR identification number N. MRs can be re-tested many times and CCA and/or CRA data appended to the entry for that chip within FDF.
MRs are then released to an assembly process and are attached at random to suitable substrates such as a printed circuit board (PCB) . After assembly is complete, all MR identities on the PCB are read. A list of identity numbers is created, cross-referencing numerous values of N with the position of MRs on the PCB. This cross-referenced list is used to access the FDF to create a sub-set of the FDF for all MRs on a particular PCB. The anti-coincidence computer program is then executed using the FDF subset as its input data. The program generates the appropriate output data in a form similar to that shown in Figure 6. This output data is used to program DCSM, DRSM and to pack the skew value table into these maps.
Figure 9 shows the process for in-situ testing of MRs.
This is similar to the process shown in Figure 8 except DCSM and/or DRSM are reprogrammed with appropriate data as a consequence of an operational failure of a chip MR. The input data for the anti-coincidence program is read back from the DSCM and/or DRSM before they are erased prior to programming. This data is appended with data describing the operational failure and then input to the anti-coincidence program. As in Figure 8 the output of the program is used to program DCSM and DRSM.

Claims

(1) A fault tolerant random access data storage system which comprises a plurality of main elements, each element comprising an array of memory locations, a first spare element and a second spare element, each spare element comprising an array of memory locations, means for addressing the elements with the logical addresses of the rows within the arrays being skewed relative to their physical addresses but in a different manner for the different elements, and with the logical addresses of the columns within the arrays being skewed relative to their physical addresses but in a different manner for the different elements, and means for recording faulty memory locations so that if a selected row in a selected main element includes a fault, then a replacement row in the first spare element is selected instead, and if a selected column in a selected main element includes a fault, then a replacement column in the second spare element is selected instead.
(2) A fault tolerant random access data storage system as claimed in claim 1, arranged so that if a selected replacement row in the first spare element includes a column fault, then a replacement column in the second spare element is selected instead.
(3) A fault tolerant random access data storage system as claimed in claim 1 or 2, arranged so that if a selected replacement column in the second spare element includes a row fault, then a replacement row in the first spare element is selected instead.
(4) A fault tolerant random access data storage system as claimed in any preceding claim, comprising a first look-up table recording faulty column locations, and a second look-up table recording faulty row locations.
(5) A method of forming a fault tolerant random access data storage system as claimed in claim 1, comprising testing a plurality of memory elements to determine and record the locations of any faults in the respective elements, processing the fault location data together with data representing the positions of the memory elements in an array, to generate addressing skew value data, and programming the skew values into look-up tables of the data storage system.
(6) A method as claimed in claim 5, in which the memory elements are tested before assembly into an array.
(7) A method as claimed in claim 5, in which the memory elements are tested or retested after assembly into an array.
PCT/GB1991/001929 1990-11-02 1991-11-04 A fault tolerant data storage system WO1992008193A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US08/050,155 US5742613A (en) 1990-11-02 1991-11-04 Memory array of integrated circuits capable of replacing faulty cells with a spare
EP91919093A EP0555307B1 (en) 1990-11-02 1991-11-04 A fault tolerant data storage system
JP3517288A JPH06502263A (en) 1990-11-02 1991-11-04 Fault tolerant data storage system
DE69125724T DE69125724T2 (en) 1990-11-02 1991-11-04 A TROUBLESHOOTING DATA STORAGE SYSTEM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9023867.6 1990-11-02
GB909023867A GB9023867D0 (en) 1990-11-02 1990-11-02 Improvements relating to a fault tolerant storage system

Publications (1)

Publication Number Publication Date
WO1992008193A1 true WO1992008193A1 (en) 1992-05-14

Family

ID=10684766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1991/001929 WO1992008193A1 (en) 1990-11-02 1991-11-04 A fault tolerant data storage system

Country Status (7)

Country Link
US (1) US5742613A (en)
EP (1) EP0555307B1 (en)
JP (1) JPH06502263A (en)
AU (1) AU8843091A (en)
DE (1) DE69125724T2 (en)
GB (1) GB9023867D0 (en)
WO (1) WO1992008193A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996030833A1 (en) * 1995-03-28 1996-10-03 Memory Corporation Electronic data storage devices and methods of manufacture and testing thereof
US6041422A (en) * 1993-03-19 2000-03-21 Memory Corporation Technology Limited Fault tolerant memory system
US6425046B1 (en) 1991-11-05 2002-07-23 Monolithic System Technology, Inc. Method for using a latched sense amplifier in a memory module as a high-speed cache memory

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19647159A1 (en) * 1996-11-14 1998-06-04 Siemens Ag Method for testing a memory chip subdivided into cell fields while a computer is in operation while observing real-time conditions
US6430188B1 (en) * 1998-07-08 2002-08-06 Broadcom Corporation Unified table for L2, L3, L4, switching and filtering
US7120117B1 (en) 2000-08-29 2006-10-10 Broadcom Corporation Starvation free flow control in a shared memory switching device
US7643481B2 (en) * 1999-03-17 2010-01-05 Broadcom Corporation Network switch having a programmable counter
US7184441B1 (en) * 1999-03-17 2007-02-27 Broadcom Corporation Network switch stacking configuration
US6295591B1 (en) * 1999-03-30 2001-09-25 International Business Machines Corporation Method of upgrading and/or servicing memory without interrupting the operation of the system
US7031302B1 (en) 1999-05-21 2006-04-18 Broadcom Corporation High-speed stats gathering in a network switch
WO2000072533A1 (en) * 1999-05-21 2000-11-30 Broadcom Corporation Stacked network switch configuration
US6859454B1 (en) 1999-06-30 2005-02-22 Broadcom Corporation Network switch with high-speed serializing/deserializing hazard-free double data rate switching
AU6334400A (en) * 1999-06-30 2001-01-22 Broadcom Corporation Memory management unit for a network switch
US7315552B2 (en) * 1999-06-30 2008-01-01 Broadcom Corporation Frame forwarding in a switch fabric
US7082133B1 (en) 1999-09-03 2006-07-25 Broadcom Corporation Apparatus and method for enabling voice over IP support for a network switch
US7131001B1 (en) 1999-10-29 2006-10-31 Broadcom Corporation Apparatus and method for secure filed upgradability with hard wired public key
US7143294B1 (en) 1999-10-29 2006-11-28 Broadcom Corporation Apparatus and method for secure field upgradability with unpredictable ciphertext
US7539134B1 (en) * 1999-11-16 2009-05-26 Broadcom Corporation High speed flow control methodology
ATE252298T1 (en) * 1999-11-16 2003-11-15 Broadcom Corp METHOD AND NETWORK SWITCH WITH DATA SERIALIZATION THROUGH HAZARD-FREE MULTI-STEP, ERROR-FREE MULTIPLEXATION
US7593953B1 (en) 1999-11-18 2009-09-22 Broadcom Corporation Table lookup mechanism for address resolution
WO2001043354A2 (en) * 1999-12-07 2001-06-14 Broadcom Corporation Mirroring in a stacked network switch configuration
US7009973B2 (en) * 2000-02-28 2006-03-07 Broadcom Corporation Switch using a segmented ring
US6678678B2 (en) 2000-03-09 2004-01-13 Braodcom Corporation Method and apparatus for high speed table search
US7103053B2 (en) * 2000-05-03 2006-09-05 Broadcom Corporation Gigabit switch on chip architecture
US6826561B2 (en) * 2000-05-22 2004-11-30 Broadcom Corporation Method and apparatus for performing a binary search on an expanded tree
US7075939B2 (en) * 2000-06-09 2006-07-11 Broadcom Corporation Flexible header protocol for network switch
EP1168710B1 (en) * 2000-06-19 2005-11-23 Broadcom Corporation Method and device for frame forwarding in a switch fabric
US7126947B2 (en) * 2000-06-23 2006-10-24 Broadcom Corporation Switch having external address resolution interface
US6999455B2 (en) * 2000-07-25 2006-02-14 Broadcom Corporation Hardware assist for address learning
US7227862B2 (en) * 2000-09-20 2007-06-05 Broadcom Corporation Network switch having port blocking capability
US6988177B2 (en) * 2000-10-03 2006-01-17 Broadcom Corporation Switch memory management using a linked list structure
US6851000B2 (en) * 2000-10-03 2005-02-01 Broadcom Corporation Switch having flow control management
US7120155B2 (en) * 2000-10-03 2006-10-10 Broadcom Corporation Switch having virtual shared memory
US7020166B2 (en) * 2000-10-03 2006-03-28 Broadcom Corporation Switch transferring data using data encapsulation and decapsulation
US7420977B2 (en) * 2000-10-03 2008-09-02 Broadcom Corporation Method and apparatus of inter-chip bus shared by message passing and memory access
US7274705B2 (en) * 2000-10-03 2007-09-25 Broadcom Corporation Method and apparatus for reducing clock speed and power consumption
US7035255B2 (en) * 2000-11-14 2006-04-25 Broadcom Corporation Linked network switch configuration
US7424012B2 (en) * 2000-11-14 2008-09-09 Broadcom Corporation Linked network switch configuration
US6850542B2 (en) 2000-11-14 2005-02-01 Broadcom Corporation Linked network switch configuration
US7035286B2 (en) * 2000-11-14 2006-04-25 Broadcom Corporation Linked network switch configuration
US7324509B2 (en) * 2001-03-02 2008-01-29 Broadcom Corporation Efficient optimization algorithm in memory utilization for network applications
US7355970B2 (en) * 2001-10-05 2008-04-08 Broadcom Corporation Method and apparatus for enabling access on a network switch
US6996738B2 (en) * 2002-04-15 2006-02-07 Broadcom Corporation Robust and scalable de-skew method for data path skew control
US7506130B2 (en) * 2002-05-22 2009-03-17 Hewlett-Packard Development Company, L.P. Mirrored computer memory on split bus
US20030221058A1 (en) * 2002-05-22 2003-11-27 Rentschler Eric M. Mirrored computer memory on single bus
US7064592B2 (en) * 2003-09-03 2006-06-20 Broadcom Corporation Method and apparatus for numeric optimization of the control of a delay-locked loop in a network device
US7132866B2 (en) * 2003-09-03 2006-11-07 Broadcom Corporation Method and apparatus for glitch-free control of a delay-locked loop in a network device
US7502474B2 (en) * 2004-05-06 2009-03-10 Advanced Micro Devices, Inc. Network interface with security association data prefetch for high speed offloaded security processing
US7624263B1 (en) * 2004-09-21 2009-11-24 Advanced Micro Devices, Inc. Security association table lookup architecture and method of operation
US7843746B2 (en) * 2007-12-31 2010-11-30 Qimonda Ag Method and device for redundancy replacement in semiconductor devices using a multiplexer
US20100162037A1 (en) * 2008-12-22 2010-06-24 International Business Machines Corporation Memory System having Spare Memory Devices Attached to a Local Interface Bus
CN116072207B (en) * 2023-04-06 2023-08-08 长鑫存储技术有限公司 Fault addressing circuit and memory

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3735368A (en) * 1971-06-25 1973-05-22 Ibm Full capacity monolithic memory utilizing defective storage cells
US4527251A (en) * 1982-12-17 1985-07-02 Honeywell Information Systems Inc. Remap method and apparatus for a memory system which uses partially good memory devices

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4556975A (en) * 1983-02-07 1985-12-03 Westinghouse Electric Corp. Programmable redundancy circuit
DE3311427A1 (en) * 1983-03-29 1984-10-04 Siemens AG, 1000 Berlin und 8000 München INTEGRATED DYNAMIC WRITE-READ MEMORY
US4584681A (en) * 1983-09-02 1986-04-22 International Business Machines Corporation Memory correction scheme using spare arrays
US4751656A (en) * 1986-03-10 1988-06-14 International Business Machines Corporation Method for choosing replacement lines in a two dimensionally redundant array
JPS62293598A (en) * 1986-06-12 1987-12-21 Toshiba Corp Semiconductor storage device
JP2577724B2 (en) * 1986-07-31 1997-02-05 三菱電機株式会社 Semiconductor storage device
US5022006A (en) * 1988-04-01 1991-06-04 International Business Machines Corporation Semiconductor memory having bit lines with isolation circuits connected between redundant and normal memory cells
US5617365A (en) * 1988-10-07 1997-04-01 Hitachi, Ltd. Semiconductor device having redundancy circuit
US5265055A (en) * 1988-10-07 1993-11-23 Hitachi, Ltd. Semiconductor memory having redundancy circuit
US5289417A (en) * 1989-05-09 1994-02-22 Mitsubishi Denki Kabushiki Kaisha Semiconductor memory device with redundancy circuit
AU5930390A (en) * 1989-07-06 1991-02-06 Mv Limited A fault tolerant data storage system
GB8926004D0 (en) * 1989-11-17 1990-01-10 Inmos Ltd Repairable memory circuit

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3735368A (en) * 1971-06-25 1973-05-22 Ibm Full capacity monolithic memory utilizing defective storage cells
US4527251A (en) * 1982-12-17 1985-07-02 Honeywell Information Systems Inc. Remap method and apparatus for a memory system which uses partially good memory devices

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6425046B1 (en) 1991-11-05 2002-07-23 Monolithic System Technology, Inc. Method for using a latched sense amplifier in a memory module as a high-speed cache memory
US6483755B2 (en) 1991-11-05 2002-11-19 Monolithic System Technology, Inc. Memory modules with high speed latched sense amplifiers
US6041422A (en) * 1993-03-19 2000-03-21 Memory Corporation Technology Limited Fault tolerant memory system
WO1996030833A1 (en) * 1995-03-28 1996-10-03 Memory Corporation Electronic data storage devices and methods of manufacture and testing thereof

Also Published As

Publication number Publication date
AU8843091A (en) 1992-05-26
DE69125724D1 (en) 1997-05-22
DE69125724T2 (en) 1997-11-20
EP0555307B1 (en) 1997-04-16
GB9023867D0 (en) 1990-12-12
JPH06502263A (en) 1994-03-10
EP0555307A1 (en) 1993-08-18
US5742613A (en) 1998-04-21

Similar Documents

Publication Publication Date Title
US5742613A (en) Memory array of integrated circuits capable of replacing faulty cells with a spare
US4355376A (en) Apparatus and method for utilizing partially defective memory devices
US5293386A (en) Integrated semiconductor memory with parallel test capability and redundancy method
JP3708726B2 (en) Defect relief circuit
EP1647031B1 (en) Memory device and method of storing fail addresses of a memory cell
US5841710A (en) Dynamic address remapping decoder
EP0689695B1 (en) Fault tolerant memory system
JP4504558B2 (en) Semiconductor integrated memory
US6661718B2 (en) Testing device for testing a memory
US5533194A (en) Hardware-assisted high speed memory test apparatus and method
US6993692B2 (en) Method, system and apparatus for aggregating failures across multiple memories and applying a common defect repair solution to all of the multiple memories
US7203106B2 (en) Integrated semiconductor memory with redundant memory cells
US7454662B2 (en) Integrated memory having a circuit for testing the operation of the integrated memory, and method for operating the integrated memory
US6552937B2 (en) Memory device having programmable column segmentation to increase flexibility in bit repair
US8321726B2 (en) Repairing memory arrays
CN114530189A (en) Chip repairing method, chip repairing device and chip
US6304499B1 (en) Integrated dynamic semiconductor memory having redundant units of memory cells, and a method of self-repair
KR100750416B1 (en) Method of testing a memory
US6684355B2 (en) Memory testing apparatus and method
JPS62250599A (en) Semiconductor memory device
US7437627B2 (en) Method and test device for determining a repair solution for a memory module
US6076176A (en) Encoding of failing bit addresses to facilitate multi-bit failure detect using a wired-OR scheme
KR900008638B1 (en) Integrated circuit with memory self-test
US7904766B1 (en) Statistical yield of a system-on-a-chip
KR100194419B1 (en) Circuit and method for using the memory for voice data as the memory for system data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU GB JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 08050155

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 1991919093

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1991919093

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1991919093

Country of ref document: EP