CA1270333A - Parity spreading to enhance storge access - Google Patents
Parity spreading to enhance storge accessInfo
- Publication number
- CA1270333A CA1270333A CA000535598A CA535598A CA1270333A CA 1270333 A CA1270333 A CA 1270333A CA 000535598 A CA000535598 A CA 000535598A CA 535598 A CA535598 A CA 535598A CA 1270333 A CA1270333 A CA 1270333A
- Authority
- CA
- Canada
- Prior art keywords
- parity
- data
- record
- blocks
- records
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/18—Error detection or correction; Testing, e.g. of drop-outs
- G11B20/1833—Error detection or correction; Testing, e.g. of drop-outs by adding special lists or symbols to the coded information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2211/00—Indexing scheme relating to details of data-processing equipment not covered by groups G06F3/00 - G06F13/00
- G06F2211/10—Indexing scheme relating to G06F11/10
- G06F2211/1002—Indexing scheme relating to G06F11/1076
- G06F2211/104—Metadata, i.e. metadata associated with RAID systems with parity
Abstract
ABSTRACT OF THE DISCLOSURE
A storage management mechanism distributes parity blocks corresponding to multiple data blocks substantially equally among a set of storage devices. N storage units in a set are divided into a multiple of equally sized address blocks, each containing a plurality of records. Blocks from each storage unit having the same address ranges form a stripe of blocks. Each stripe has a block on one storage device containing parity for the remaining blocks of the stripe. Further stripes also have parity blocks, which are distributed on different storage units. Parity updating activity associated with every change to a data record is therefore distributed over the different storage units, enhancing access characteristics of the set of storage devices. The parity updating activity also includes the use of an independent version number stored with each data record and corresponding version numbers stored with the parity record. Each time a data record is changed, its version number is incremented and the corresponding version number in the parity record is incremented with the parity record update.
A storage management mechanism distributes parity blocks corresponding to multiple data blocks substantially equally among a set of storage devices. N storage units in a set are divided into a multiple of equally sized address blocks, each containing a plurality of records. Blocks from each storage unit having the same address ranges form a stripe of blocks. Each stripe has a block on one storage device containing parity for the remaining blocks of the stripe. Further stripes also have parity blocks, which are distributed on different storage units. Parity updating activity associated with every change to a data record is therefore distributed over the different storage units, enhancing access characteristics of the set of storage devices. The parity updating activity also includes the use of an independent version number stored with each data record and corresponding version numbers stored with the parity record. Each time a data record is changed, its version number is incremented and the corresponding version number in the parity record is incremented with the parity record update.
Description
r PARITY SPREADING TO ENHANCE STORACE ACCESS
Background of the Invention The present invention relates to maintaining parity infor~ation on multiple blocks of data and in particular to the S storage of such parity information.
U. S. Patent No. 4,092,732 to Ouchi describes a check sum generator for generating a check sum segment from segments of a system record as the system record segments are being transferred between a storage subsystem and a central processing unit. The check sum æegment is actually a series of parity bits generated from bits in the same location of the system record segments. In other words, each bit, such as the first bit of the check sum segment is the parity of the group of first bits of the record segments. When a storage unit containing a record segment fails, - 15 the record segment is regenerated from the check sum seg~ent and ~ the remaining system segments. One s~orage unit is sPlected for ;~ containing all the check sum segments for a plurality of record storage units.
-..
In the Ouchi patent, the check sum segment is al~ays generated from reading all the record segments it covers. If one rPcord segment is changed~ all the record segments covered are read and the checksum segment 16 generated. An IB~*Technical Disclosure Bulletin, Vol. 24, No. 2, July 1981, pages 986-987, Efficient Mass Storage Parity Recovery Mecha~ism, improves upon the generation of the checksum segment, or parity segment by copying a record segment before it is changed~ The copy of the record segment is then exclusive-ORed with the changed record segment to create a change mask. The parity segment is then read and exclusive-ORed with ~he change mask to generate the new parity segment which is then written back out to the storage unit.
While a number of reads on record segments that are not changed is avoided in the prior art, a single storage unit iS used * Registered Trade Mark : ::
:~ 2~
RO9 - ~6 - O 1.4 2 to store parity segments for multiple record segments on multiple storacJe devices. A read and a write on the single storage unit occurs each -time a record is changed on any of the storage units covered by the parity record on the s;ngle storage uni-t. Thus, the single storage unit becomes a bottle-neck to storage operatlons since the number of changes to records which can be made per unit of time is a function of the access rate of the single storage uni-t as opposed to the -Faster access rate provided by parallel operation of the multiple s-torage units.
Recovery of a lost record depends on the synchronization of the parity record with each of the data records that it covers. ~ithout special hardware, such as non-volatile storage, and/or additional write operat70ns to storage units, it is dlfficult to guarantee tha-t both the data records and parlty record are updated to a consistent state iF the system terminates abnormally~ Since two I/0 operations are required to upda-te the data and its associated pari-ty, it is difficult to determine which I/0 operation has completed following the system termination.
Summary_of the Invention ~ storage management mechanism dis-tributes parity information substantially equally among a set of storage units. N storage units in a set are divided into a plurality of e~ually si7ed address areas referred to as blocks. Each storage unit contains the same number of blocks. 810cks from each storage unit in a set having the same unit address ranges are referred to as stripes. Each stripe has N-1 blocks oF data and a parity block on one storage device containing parity For the remainder of the stripe. Further stripes each have a pari-ty block, the pari-ty blocks being distributed on different storage units.
Parity updating activity associated with every modification of data in a set is therefore distributed over the different storage units. No single unit is burdened wi-th all the parity update activity.
~27~33~
P~09-86-014 3 In the preferred embodiment, each storage unit participating in a set is the same size. This permits a simplified definition of the stripes. Since each storage unit is the same size, no storage unit has areas left over which need to be handled separately.
The number of storage uni-ts in a set is preFerably more than two.
With just two units, the protection is similar to mirroring, which involves maintaining two exact copies of data. With more than two units, the percent of storage dedicated to protection decreases. With three units, the percentage of storage needed to obtain the desired protection is about 33 percent. With eight units, sl;ghtly more than 12.5 percent of storage capacity is used for protection.
' .
In a further preferred embodiment, N-1 data records (520 bytes) and 1 parity record in a set having the same address range is referred to as a slice. Each data record in the sllce has a version indicator wh~ch lndicates the version o~ the recorcl. A header in each parity record comprises a plurality of vers;on indicators, one corresponding to each of the data records in the slice. Each time a data record is updated, its version indicator is incremented and the parity record version indicator corresponding to that data record is also incremented. When both the record update and the parity update are complete, the version indicators are equal. During recovery of a lost record, the version numbers are checked to ensure synchronization of the records with the parity. Forcing recovery without valid synchronization would produce unpredictable data.
Because each data record has a version number that is independent of every other data record, no serialization is required for updates to diFferent da-ta records covered by the same parity record. Also avoided, is the need to read the parity record from storage before scheduling the data record write operation and queuelng a parity update request.
~, ?33;~
RO9-~36-014 In a Further preferred embodiment, an unprotected stripe is provided -for data records which need not be covered by parity groups.
The unprotectecl stripes need not be the same size as the protected str;pes, and may include variable size areas on the storage units if the storage units are not identical in size. Such striping provides a convenient method of segregating the storage units into areas of protected and unprotected storage because the same address area of each storage deYice is subject to protec-tion. Performance benefits are realized if it is unnecessary to protect all the records stored on the un;ts because there is no parity update required after a record in an unprotected stripe is changed.
_ief Desc~ption of the Dr_ing~
Fig. 1 is a block diagram of a system incorporating the parity block protect;on distribu-t;on of the present invention.
Fig. 2 is a block diagram oF the distribution of the parity blocks of Fig. 1 over a plurality of storage devices;
Fig. 3 is a block diagram representation of logical tables used to correlate parity groups and data records;
Fig. 4 is a block diagram of records in a stripe of storage util;zing version ind;cations for synchronization;
F;g. 5 is a flow diagram of the initialization of storage devices for par;ty block protection; and Fig. 6 is a flow diagram of steps involved ;n updat;ng data records and their corresponding parity records.
Background of the Invention The present invention relates to maintaining parity infor~ation on multiple blocks of data and in particular to the S storage of such parity information.
U. S. Patent No. 4,092,732 to Ouchi describes a check sum generator for generating a check sum segment from segments of a system record as the system record segments are being transferred between a storage subsystem and a central processing unit. The check sum æegment is actually a series of parity bits generated from bits in the same location of the system record segments. In other words, each bit, such as the first bit of the check sum segment is the parity of the group of first bits of the record segments. When a storage unit containing a record segment fails, - 15 the record segment is regenerated from the check sum seg~ent and ~ the remaining system segments. One s~orage unit is sPlected for ;~ containing all the check sum segments for a plurality of record storage units.
-..
In the Ouchi patent, the check sum segment is al~ays generated from reading all the record segments it covers. If one rPcord segment is changed~ all the record segments covered are read and the checksum segment 16 generated. An IB~*Technical Disclosure Bulletin, Vol. 24, No. 2, July 1981, pages 986-987, Efficient Mass Storage Parity Recovery Mecha~ism, improves upon the generation of the checksum segment, or parity segment by copying a record segment before it is changed~ The copy of the record segment is then exclusive-ORed with the changed record segment to create a change mask. The parity segment is then read and exclusive-ORed with ~he change mask to generate the new parity segment which is then written back out to the storage unit.
While a number of reads on record segments that are not changed is avoided in the prior art, a single storage unit iS used * Registered Trade Mark : ::
:~ 2~
RO9 - ~6 - O 1.4 2 to store parity segments for multiple record segments on multiple storacJe devices. A read and a write on the single storage unit occurs each -time a record is changed on any of the storage units covered by the parity record on the s;ngle storage uni-t. Thus, the single storage unit becomes a bottle-neck to storage operatlons since the number of changes to records which can be made per unit of time is a function of the access rate of the single storage uni-t as opposed to the -Faster access rate provided by parallel operation of the multiple s-torage units.
Recovery of a lost record depends on the synchronization of the parity record with each of the data records that it covers. ~ithout special hardware, such as non-volatile storage, and/or additional write operat70ns to storage units, it is dlfficult to guarantee tha-t both the data records and parlty record are updated to a consistent state iF the system terminates abnormally~ Since two I/0 operations are required to upda-te the data and its associated pari-ty, it is difficult to determine which I/0 operation has completed following the system termination.
Summary_of the Invention ~ storage management mechanism dis-tributes parity information substantially equally among a set of storage units. N storage units in a set are divided into a plurality of e~ually si7ed address areas referred to as blocks. Each storage unit contains the same number of blocks. 810cks from each storage unit in a set having the same unit address ranges are referred to as stripes. Each stripe has N-1 blocks oF data and a parity block on one storage device containing parity For the remainder of the stripe. Further stripes each have a pari-ty block, the pari-ty blocks being distributed on different storage units.
Parity updating activity associated with every modification of data in a set is therefore distributed over the different storage units. No single unit is burdened wi-th all the parity update activity.
~27~33~
P~09-86-014 3 In the preferred embodiment, each storage unit participating in a set is the same size. This permits a simplified definition of the stripes. Since each storage unit is the same size, no storage unit has areas left over which need to be handled separately.
The number of storage uni-ts in a set is preFerably more than two.
With just two units, the protection is similar to mirroring, which involves maintaining two exact copies of data. With more than two units, the percent of storage dedicated to protection decreases. With three units, the percentage of storage needed to obtain the desired protection is about 33 percent. With eight units, sl;ghtly more than 12.5 percent of storage capacity is used for protection.
' .
In a further preferred embodiment, N-1 data records (520 bytes) and 1 parity record in a set having the same address range is referred to as a slice. Each data record in the sllce has a version indicator wh~ch lndicates the version o~ the recorcl. A header in each parity record comprises a plurality of vers;on indicators, one corresponding to each of the data records in the slice. Each time a data record is updated, its version indicator is incremented and the parity record version indicator corresponding to that data record is also incremented. When both the record update and the parity update are complete, the version indicators are equal. During recovery of a lost record, the version numbers are checked to ensure synchronization of the records with the parity. Forcing recovery without valid synchronization would produce unpredictable data.
Because each data record has a version number that is independent of every other data record, no serialization is required for updates to diFferent da-ta records covered by the same parity record. Also avoided, is the need to read the parity record from storage before scheduling the data record write operation and queuelng a parity update request.
~, ?33;~
RO9-~36-014 In a Further preferred embodiment, an unprotected stripe is provided -for data records which need not be covered by parity groups.
The unprotectecl stripes need not be the same size as the protected str;pes, and may include variable size areas on the storage units if the storage units are not identical in size. Such striping provides a convenient method of segregating the storage units into areas of protected and unprotected storage because the same address area of each storage deYice is subject to protec-tion. Performance benefits are realized if it is unnecessary to protect all the records stored on the un;ts because there is no parity update required after a record in an unprotected stripe is changed.
_ief Desc~ption of the Dr_ing~
Fig. 1 is a block diagram of a system incorporating the parity block protect;on distribu-t;on of the present invention.
Fig. 2 is a block diagram oF the distribution of the parity blocks of Fig. 1 over a plurality of storage devices;
Fig. 3 is a block diagram representation of logical tables used to correlate parity groups and data records;
Fig. 4 is a block diagram of records in a stripe of storage util;zing version ind;cations for synchronization;
F;g. 5 is a flow diagram of the initialization of storage devices for par;ty block protection; and Fig. 6 is a flow diagram of steps involved ;n updat;ng data records and their corresponding parity records.
2~33~
R0~-86-014 5 Detai ed D scri~ n_of the Pr ferred Embodimenk A computer system implementiny block parity spreading is indicated generally at 10 in Fig. 1. System 10 comprises a data processing unit 12 coupled to a control store 1~ which provides fast access to microinstructions. Processor 12 communicates via a channel adapter 16 and through a high-speed channel 18 to a plurallty of I/O
units. Processor 12 and the I/O units have access to a main storaye array 20. Access to main storage 20 is provided by a virtual address translator 22. Address translation tables in main storage 20, and a translation lookaside buffer provide mapping from virtual to real main storage addresses.
Each I/O device, such as disk drives 30, 32, 34, 36, and 38 is coupled through a controller, such as a disk s-torage controller ~0 for the above disk clrive storage devices. I/O controller 42 controls tape devices 44 and 46. Further I/O controllers 48, 50 and 52 control I/O
devices such as prin-ters, workstations, keyboards, displays, and communications. There are usually multiple disk storage con-trollers, each controlling multiple disk drive storage devices.
Data in system 10 is handled in the form of records comprising 512 byte pages of data and 8-byte headers. In the preferred embodiment, an IBM System/38, records are moved lnto and out of main storage 20 from disk storaye via the channel 18. A main storage controller 60 controls accessing and paying of main storage 20. A
broken line 62 between channel 18 and maln s-torage controller 60 indicates direct memory access to main storaye 20 by the I/O devices coupled to the channel. Further detail on the general operation of system 10 ;s found in a book, IBM System/38 Technical Developments, International 8usiness Machines Corporation, 1978.
Protection of data on the disk storage devices 30 through 38 is provided hy exclusive ORing data records on each device, and ~76~333 R09-~36-Ol~ 6 storing the parity record resulting from the exclusive OR on one of the storage devices. In Fig. 2, each storage device 30 through 38 is divided in-to blocks of data and blocks of parity. The blocks represent physical space on the storage devices. Since the sys-teM 10 provides an exten-t (a contiguous piece of allocatable disk space) of up to 16 megabytes of data, each block is preferably 16 megabytes.
Bloc~s 70, 72, 74, 76 and 78, one on each storage device, preferably having the same physical address range, are re-ferred to as a stripe. There are 9 s-tripes shown in Fig. 2. Each protected stripe has an assoc;ated parity block which contains the Exclusive OR of the other blocks in the stripe. In the first stripe, block 70 contains the parity for the remainlng blocks 72, 74, 76 and 78. A block 80 on storage device 32 contains the parity for the remaining blocks on the second stripe. Block 82, on storage device 3~ contains the parity for the thlrd stripe. ~locks 8~ and 86 contain the parity for the fourth ancl fifth stripes respectively. The parity blocks, including blocks 88, 90, and 92 for stripes 6, 7 and 8 are spread out, or distributed over the storage devices. The 9th stripe is an unprotected area which does not have a parity block associated with it. It is used to store data which does not need protection from loss.
Spreading of the parity information ensures that one particular storage device is not accessed much more -than the other storage devices during writing of -the parity records following updates to data records on the different stripes. A change to records stored in a block w-ill result in a change also having to be made to the parity block f~r the stripe including the changed records. Since the parity blocks for the stripes are spreacl over more than one storage device, the parity updates will not be concentrated at one devlce. Thus, I/O
activity is spread more evenly over al the storage devices.
~' ~' In Fig. 3, a unit table 310 contains information for each storage uni-t which participates in the parity protection. A physlcal address comprising a unit number and a sector, or page number is used to ident;-Fy the location oF the clesired da-ta. The unit table is then used to identify parity contrnl blocks indicated at 31~, 316, and 318.
Units 1-8 are members of the first parity set associated with control block 314. Units ~-13 are members of the second parity set associated with control block 316, and units N-2 to N are members of the Ith parity set associated with con-trol block 318.
Each entry in the unit table points to the control block associated with the set of storage devices of which the entry is a member. The control blocks identify which unit of the set contains the parity block for each stripe. In control block 31~, the stripe comprising the first 16 megabytes of storage has its parity inForma-tion stored in urlit number 1 in unit number 1's First 16 megabytes. The seconcl 16 megabytes of -the stripe comprlsing unlts 1-8 is contained ln the second 16 megabytes oF unit number 2. The parity block allocation continues in a round robin manner with units 3-8 having parity for the next 6 stripes respectively. The ninth stripe in the first pari-ty set has its parity stored in the ninth block of 16 megabytes on unit number 1. The last stripe, allocated to unit J, one of the eight units in the set, may not contain a full 16 megabytes depending on whether the addressable storage of the units is divisible by 16 megabytes A header in each of the parity control blocks describes which units arP in the set, and also identifies an address range common for each of the units which is not pro-tected by a parity group. Having a common range for each unit which is not protected, simplifies the parity pro-tection scheme. Since the same physical addresses on each storage device are exclusive ORed to determlne the parity information, no special tables are required to correlate the information to the parity okher -than those shown in Fig. 3. The common unpro-tected address range requires no special ~ .
33~
R09-86-Ol4 consideration, since the identifica-tion of the range is in the control block and is common for each unit.
Parity control block 31~ corresponds to the set of storage units 9-13 in Fig. 3. These five units may be thought of as s-torage units 30 - 38 in Fig. 2. The allocation of parity groups to storage un-its is on a round robin basis. Each consecutive 16 megabytes of storage has i-ts parity yroup stored on consecutive storage devices starting with device 30, or unit number 9 in Fig. 3. Unit number 9 also con-tains the parity group for the 80-96 megabyte range of the stripe for storage units 9-13 (30-38). The last stripe in -the set has its parity stored on the Kth unit, where K is the unit where the allocation of parity blocks ends because there are no more protected stripes.
Control block 318 corresponds to the set oF storage units N-2 through N in the Ith set oF storage units. The last unit allocatecl a parity block ls labeled L, and is one of the three units In the set depending on the number of stripes in the set. The Ith set contains the minimum number of storage units, three, considered desirable for implementation of the parity protection. The use of two units would be posslble, but would be similar to mirroring, with the extra step of an exclusive OR. Eight units in a set has been selected as the maximum in the preferred embodiment due to a system specific constraint to be discussed below. More than eight units may be used irl a set without loss of protection.
With a very large number oP units in a set, reconstruction of the data lost when a single unit fails would take a longer time because each unit would have to be read. There is also an increased chance of loss of more than one unit at a time. If this occurs, it ;s not possible to reconstruct the data from either of the lost unlts using the simple parity discusse-l above. The invention is consldered broad enough to cover a more complex data protection code, which may be stored similarly to the parity, and permit multiple bit correction in the event more than one storage ' .
. .
-,: .
7~33~
R09-86-Ol4 9 device fails. A set could also be arranged multidimensionally as described in the IBM TDB, Vol 24, No. 2 Pages 986-987, to permit reconstruction of clata from at leas-t two failed units. Further embodiments spread the parity information based on frequency of updat;ng data to spread the I/O activi-ty evenly, as opposed to spreading the parity itself evenly.
Each data record contains a version number. Since updates to multiple data records, covered by one parity record may occur, each record in a parity block also contains a corresponding version indication for each record in the slice it covers. A slice is a set of data records and their corresponding parity record. The version indications are not coverecl by the parity protection scheme. In Fig.
~, four data records, 410, ~12, 41~ and ~16 each contain a header wi-th a record version number indicated at ~18, ~20, 422 and 424 t respectively. A parity record 426 contains a header with four version numbers, 42~, ~30, 432 and 434 corresponding to the version numbers in the data records. The version numbers or indications may be any length compatible with the nurnber of blts available in the record headers. A one bit length was chosen due to the unavallabllity of further bits. Each time a data record is changed, its version number is incremented. The correspondlng version number in the parity record is also incrernented so that they have the same value.
The version numbers are used to check for synchronization of the parity record with each data record in a slice in the event of lost data. When changes to several data records occur, the data records may be written -to disk storage before or after the parity records are updated with the change masks. The change masks are queued for incorporation into the parity records. Assoclating a version number with each parity and data record adds a constraint that parity record updates for a given data record -to disk storage must be processed in the same order as the da-ta record updates. Otherwise, the vers~on numbers may not be accurate. A FIFO queue holds the change masks so that they are incorporated Into the Ro9-86-014 10 parity records on mass storage in the order that the change masks were generated.
Spec-ial consideration is given to the version numbers due to the limited availability of bits -in the héaders. The version number for each data record is stored in -the first bit position of the 6th byte of the respective records. The corresponding version numbers in the parity record are contained in the first 4 bit positions oF the 6th byte and tha first 3 bit positions of the gth byte o~ the header. The bit positions for the parity record header version numbers will be referred to as bits 1-7 corresponding to the order described above.
The units in a set are numbered 1-n corresponding to their order in the parity control block for the set. The version numbers are stored in ascending order, based on the unit number, in the parity record : headers, skipping the parity unit. If the third unit is the parity unit, the version numbers corresponding to the first two un;its are stored in bit positions 1 and 2. The version number corresponding to the fourth unit is stored in bit position 3. The nth unit version number is in position n-1 in the parity record header. Storing version numbers ;n this manner permits the largest possible set given the storage limitation. With extra storage, the size of the set may be optimized as a function of other system considerations. The positioning of the version numbers in the parity header could also have a straight forward unit number to bit position correspondence.
Because each data record has a version number that is independent of every other data record, no ser-ialization is required for updates to different data records covered by the same parity record. Transfer of records into and out of main storage may be based on other system through-put considerations to improve processing speed. The separate versiorl numbers also eliminate the need to read to parity records from mass storage before schedul-ing a data record write operation and queueing a parity update request.
~ .
~7~)3~3 R09-86-01.4 ll In order to limi-t the amount o-f storage occupied by a version number, it is allowed to wrap -from -the lar~es-t to the smallest value without error. This enables the ~ersion number to be reused. If a one bit version number is used, it wraps from 1 to 0. Thus, there are two values it assumes. A version number with a higher number of bits allows more values. Since data record and parity record updates are asynchronous, an update to a data record ;s held if the updated version number could be con-fused w;th an ex;st;ng vers;on number.
Such confus;on could ex;st if the same vers;on number as the vers;on number assoc;ated with the update, could exist on the data record or the par;ty record on mass storage. The t;me frame that must be cons;dered is from the time a new parity update request is placed on the FIFO queue to the t;me -the request ;s completed. Both the data record and the par;ty record must be updated before an update request is cons;dered complete.
The vers;on numbers that could ex~st on d;sk before a new update request is completed include all the values For prior update ~equests that are st;ll on the queue plus the value that precedes the f;rst (oldest) request element in the queue. If an update request is not removed from the queue unt;l ;t ;s completed, ;t ;s only necessary for a new update to wa;t ;f there are other requests on the queue for the same data record, and the incremen-ted version number for the new data record matches ~he version number preced;ng the f;rst request element st;ll ;n the queue.
The performance cost of search;ng the queue before schedul;ng an update ;s not severe as long as the number of update requests ;n the queue remains small. A reasonably fast access time to the storage used For the queue also reduces the per-Formance cost. Mainstore provides a su;table storage area For the queue. Keep;ng the number of update requests ;n the queue small ;s also important to ensure that no request has an excess;ve wa;t time.
.
~hen, as ;n the preferred embodiment, the version numbers are ;mplemented as slngle bit Flags, ;t is only necessary to search the ~7~?33~
R09-~36-0]~ 12 queue for any prior update reques-t -to the same data record ;n order to determine whether a new update must wait. There is no need to check version number values, because a new request must always wait if there is an incomplete request ln the queue for the same data block.
In the preferred embodiment, the record headers contain seven unused bits available -for implementation of the present invention.
lhis places a limit on the number of units which may participate in a set. Only seven version numbers may be kept in a parity record, so only up to eight total units participate in a parity protection set.
As mentioned before, many more units, or as little as three units could efficiently make use of the present invention.
CONFIGURATION OF SYSTEM FOR PARITY PROTECTION
.
Configuration of the system for parity protection of data is lnitiated by a user at 510 in the flow diagram of F;g. 5. A set-up task on processor 12 builds the parity control blocks 31~ - 318 in block 512 of the flow diagram. The task uses information in a configuration unit table 312 which identifies the storage units coupled to the system. A storage device, such as the IBM 3370 may have more than one independent unit in it, such as an independently :~ accessible arm, but only one unit from a storage device is chosen for any part~cular set. Another criterion for unit selection involves using as many disk controllers as possible in a set. These criteria are used so tha-t a failure mode will not affect two units in a set.
The set-up task also maximizes useable capacity by max;mizing the ~ number of units in a set.
:
: After the control blocks have been built, the stripe arrays in the control blocks are built at 514. The stripe arrays, as previously mentioned assign parlty blocks on a round robin basis to member un;ts ; in each set. The stripe arrays are formed to indicate which units con-tain the parity blocks for successive stripes compris;ng 16 megabyte blocks from each unit. A user may also ~L~7~`33;3 Ro9-86-014 l3 define a size of unprotected storage desired. A definition of the address range of the unprotected stripe is enterecl in the header of the control block. The setup task will not define a block from the unprotected stripe as a parity block, so the entire unprotected stripe will be available for data records.
I
Next, the set up task writes the control blocks at 516 to more than one member unit o~ each set to which the control blocks correspond. The control blocks are used during recovery to identify the members of the sets. Thus, they must be available without parity recovery. They are written to more than one member unit of each set in case one of the member units containing it fails. In the event that the protection scheme protects agains-t failure of more than one unit, the control blocks are written to at least one more unit than the number of units which may be recovered. In the preferred embodlment, they are written to each member of the set so that the units need not be searched to cletermine which unlt contains the control block. Now that the sets are identified, and the control blocks built, block 518 of the set-up task validates the parity blocks including the version numbers. In the preferred embodiment, this is done by zeroing all the clata on all the units. Since even parity is used for the parity protection, the result is a valid parity for all the data. The version numbers are also zero to start with.
The system is then initiali7ed in a standard manner at 520 by causing an initial program load.
It is also possible to add a member to a set which does not yet have the maximum number of units. The new member is preferably zeroed, and added to the unit table. The parity blocks of the se-t are then redistributed to include the new unit. The control block for the set ls then revised, and the units which contained parlty blocks that were transferred to the new unit have their address ranyes whlch contained the parity blocks zeroed, valldating the parity group for that stripe.
~27~3~3 R~ 86-Ol4 14 In the preferred embodimen-t, the control block is temporarily revised to ;ndica-te a change in the set will occur due to the addition oF the new un;t. The temporary change is done ;n case a Failure occurs durlng redistr;bution of the parity blocks. The parity blocks of an existing set are then redistributed to include the new unit.
The units which contained parity blocks that were transferred to the new unit have their address ranyes which contained the parity blocks ~eroed. When redistribution is complete, the changes to the control block are made permanent.
It is also possible to add a new unit without transferring parity blocks~ Not transferring the parity blocks would fail to make use of the increase in access rate of the set possible by adding the new un~t. The new unit would however be protected by the existing parity blocks.
UPDATING RECORD AND PARITY IN A SLICE
In Fig. 6, changing a data record and its corresponding parity record is shown. Such a change may be called for by a person chang;ng data in a data base, or by a machine process requesting a change for a number of reasons. At 610, a user task operating on processor 12 reads the data record to be changed. The user task first makes an extra copy of the data record at 612 and then makes the changes to the data record in a conventional manner.
,~
The user task then creates a change mask at 61~ by exclusive OR;ng the changed data record into the extra copy of the data record.
The first bit of the changed data is exclusive ORed with the first bit of the extra copy to form the f-irst bit of the change mask. Each consecutive b;t of the changed data is similarly exclusive ORed with corresponding bits of the ex-tra copy to form further bits of the change mask. An already exis-ting mach;ne instruction performs the exclus;ve OR of the headers of the changed record and the cop:ied record. Two further exclusive OR machine instructions perform exclus;ve ORs of the pages in 256 byte blocks.
~LZ7~3313 R09-86-01~ 15 The positions in the header corresponding to t~e version numbers in the parity record are then ~eroed in the change mask so that they will not affect the version numbers of the parity record.
The version number in the data record header is then incremented by the user task at block 616. A user task at 617 then determines which unit contains the appropriate parity record by search~ng the unlt table 312 based on the unit number from the physical address of the new data. The unit table indicates which control block 314-318 to use to identify the unit containing the parity record for the particular data record. Again, the address of the new data record is used to identify the stripe of interest. Once the unit containing the parity record for the stripe is identified, task 617 places an update request which includes the record address and change mask on a queue 624 for the appropriate unit. The update request also indica-tes that the data record has not yet been written. Prior to task 617 placing an update request on queue 624, it searches queue 624 to ensure there can be no confuslon with vers;on numbers as discussed previously. If confusion is possible, the user task waits until confus;on is not posslble before proceeding.
The address of the parity record is known to be the same as the address of the new data except that it is on a different unit. A
write request for the new data is issued at 618 to a queue 620. When the data record is written to storage 622, an indication is sent of that fact at 625 to the update request on queue 624. Flow is then returned to wait for the next data record change.
`:
A parity update task starts at 627 by gettlng the next parity update request from queue 624. Once the parity record identified in the update request has been read at 626~ the change mask is exclusiYe ORed at 62~ i nto the parity record. The version numbers are not changed by the exclusive OR
~ 333 because the change mask contains zeros in the bit positions corresponding to the version numbers. The exclusive OR is performed using the same exclusive OR machine instruction as is used by block 614 of the user task. The parity record version number corresponding to the data record is then incremented at 630, and a write request for the parity record is issued at 632. The parity update task then gets the next parity update request at 627.
A queue 634 stores parity record wr;te reques-ts for the storage units. ~n this case, a storage unit 636 is indicated for the particular write request. Storage unit 636 is different than unit 622 because the data and the parity records are not to be written to the same unit. When the write of the parity record completes, the update request on queue 624 is notified at 637. When both the data write and the pari~.y write are completed, the entry is removed From queue 624, indicating that there is no longer the possibllity of confusion with the version number.
RECORD RECOVERY
Recovery is performed when either a single record read error is encountered dur;ng normal operation of the system or when an entire unit fails.
~' When a unit fails, the data lost on the unit is reconstructed from the remainjng units of the set, as indicated in the control blocks stored on more than one member of the set. The failed unit is replacecl or repaired. ~ata for the new unit is then reconstructed from the remaining members of the set record by record. A parity record is reconstructed simply by reading the data records in the set and regenerating the parity. The version numbers in the parity record are set equal to the corresponding version numbers in the data records.
When regenerating a data record, a check is first made on each of the data records -to determine if their ~ersion numbers match the version numbers in the parity record for that slice. XF any of the versinn numbers in the slice do not match, a lost data indication ~7~ 33 R09-~6-014 17 is written in the header of the lost record. If the version numbers match, the records in the slice are then exc1usive ORed one by one into the new record. lhe appropriate version number is then copied from the parity record into the new record heacler.
If a read error is encountered on either a data record or a parity record, the contents of the failing record are reconstructed with the same mechanism as described above for recovery of an entire unit. In this case, i-t is necessary to hold all change activity on the slice containiny the failed record while recovery is in progress.
Following reconstruction, normal operation of the system continues, with the only data lost being that for whlch an update of data or parity was made and a unit failed before the corresponding parity or clata record could be wrltten. Thus, the vast majority of data on the failed unit was recovered without the unit redundancy overhead of mirroriny. By dlstrlbuting the parity lnformation over the members of the set as opposed to tylng up one device with the parity information, parallel operation of storage devices is utilized to provide maximum access rates.
Wh;le the invention has been descr-ibed with respect to one or more preferred embodiments~ and with respect to one particular system, it will be recognized by those skilled in the art that the invention can take many forms and shapes. The record sizes are in no way limited to those discussed, nor are the storage units limited to disk drive devices. The fact that only identical devices are used in sets is merely a ma-tter of design choice for simplification. Numerous combinations o~ storage units in sets, and the distribution of the data protection or parity blocks are within the scope of the invention as described and as claimed below.
R0~-86-014 5 Detai ed D scri~ n_of the Pr ferred Embodimenk A computer system implementiny block parity spreading is indicated generally at 10 in Fig. 1. System 10 comprises a data processing unit 12 coupled to a control store 1~ which provides fast access to microinstructions. Processor 12 communicates via a channel adapter 16 and through a high-speed channel 18 to a plurallty of I/O
units. Processor 12 and the I/O units have access to a main storaye array 20. Access to main storage 20 is provided by a virtual address translator 22. Address translation tables in main storage 20, and a translation lookaside buffer provide mapping from virtual to real main storage addresses.
Each I/O device, such as disk drives 30, 32, 34, 36, and 38 is coupled through a controller, such as a disk s-torage controller ~0 for the above disk clrive storage devices. I/O controller 42 controls tape devices 44 and 46. Further I/O controllers 48, 50 and 52 control I/O
devices such as prin-ters, workstations, keyboards, displays, and communications. There are usually multiple disk storage con-trollers, each controlling multiple disk drive storage devices.
Data in system 10 is handled in the form of records comprising 512 byte pages of data and 8-byte headers. In the preferred embodiment, an IBM System/38, records are moved lnto and out of main storage 20 from disk storaye via the channel 18. A main storage controller 60 controls accessing and paying of main storage 20. A
broken line 62 between channel 18 and maln s-torage controller 60 indicates direct memory access to main storaye 20 by the I/O devices coupled to the channel. Further detail on the general operation of system 10 ;s found in a book, IBM System/38 Technical Developments, International 8usiness Machines Corporation, 1978.
Protection of data on the disk storage devices 30 through 38 is provided hy exclusive ORing data records on each device, and ~76~333 R09-~36-Ol~ 6 storing the parity record resulting from the exclusive OR on one of the storage devices. In Fig. 2, each storage device 30 through 38 is divided in-to blocks of data and blocks of parity. The blocks represent physical space on the storage devices. Since the sys-teM 10 provides an exten-t (a contiguous piece of allocatable disk space) of up to 16 megabytes of data, each block is preferably 16 megabytes.
Bloc~s 70, 72, 74, 76 and 78, one on each storage device, preferably having the same physical address range, are re-ferred to as a stripe. There are 9 s-tripes shown in Fig. 2. Each protected stripe has an assoc;ated parity block which contains the Exclusive OR of the other blocks in the stripe. In the first stripe, block 70 contains the parity for the remainlng blocks 72, 74, 76 and 78. A block 80 on storage device 32 contains the parity for the remaining blocks on the second stripe. Block 82, on storage device 3~ contains the parity for the thlrd stripe. ~locks 8~ and 86 contain the parity for the fourth ancl fifth stripes respectively. The parity blocks, including blocks 88, 90, and 92 for stripes 6, 7 and 8 are spread out, or distributed over the storage devices. The 9th stripe is an unprotected area which does not have a parity block associated with it. It is used to store data which does not need protection from loss.
Spreading of the parity information ensures that one particular storage device is not accessed much more -than the other storage devices during writing of -the parity records following updates to data records on the different stripes. A change to records stored in a block w-ill result in a change also having to be made to the parity block f~r the stripe including the changed records. Since the parity blocks for the stripes are spreacl over more than one storage device, the parity updates will not be concentrated at one devlce. Thus, I/O
activity is spread more evenly over al the storage devices.
~' ~' In Fig. 3, a unit table 310 contains information for each storage uni-t which participates in the parity protection. A physlcal address comprising a unit number and a sector, or page number is used to ident;-Fy the location oF the clesired da-ta. The unit table is then used to identify parity contrnl blocks indicated at 31~, 316, and 318.
Units 1-8 are members of the first parity set associated with control block 314. Units ~-13 are members of the second parity set associated with control block 316, and units N-2 to N are members of the Ith parity set associated with con-trol block 318.
Each entry in the unit table points to the control block associated with the set of storage devices of which the entry is a member. The control blocks identify which unit of the set contains the parity block for each stripe. In control block 31~, the stripe comprising the first 16 megabytes of storage has its parity inForma-tion stored in urlit number 1 in unit number 1's First 16 megabytes. The seconcl 16 megabytes of -the stripe comprlsing unlts 1-8 is contained ln the second 16 megabytes oF unit number 2. The parity block allocation continues in a round robin manner with units 3-8 having parity for the next 6 stripes respectively. The ninth stripe in the first pari-ty set has its parity stored in the ninth block of 16 megabytes on unit number 1. The last stripe, allocated to unit J, one of the eight units in the set, may not contain a full 16 megabytes depending on whether the addressable storage of the units is divisible by 16 megabytes A header in each of the parity control blocks describes which units arP in the set, and also identifies an address range common for each of the units which is not pro-tected by a parity group. Having a common range for each unit which is not protected, simplifies the parity pro-tection scheme. Since the same physical addresses on each storage device are exclusive ORed to determlne the parity information, no special tables are required to correlate the information to the parity okher -than those shown in Fig. 3. The common unpro-tected address range requires no special ~ .
33~
R09-86-Ol4 consideration, since the identifica-tion of the range is in the control block and is common for each unit.
Parity control block 31~ corresponds to the set of storage units 9-13 in Fig. 3. These five units may be thought of as s-torage units 30 - 38 in Fig. 2. The allocation of parity groups to storage un-its is on a round robin basis. Each consecutive 16 megabytes of storage has i-ts parity yroup stored on consecutive storage devices starting with device 30, or unit number 9 in Fig. 3. Unit number 9 also con-tains the parity group for the 80-96 megabyte range of the stripe for storage units 9-13 (30-38). The last stripe in -the set has its parity stored on the Kth unit, where K is the unit where the allocation of parity blocks ends because there are no more protected stripes.
Control block 318 corresponds to the set oF storage units N-2 through N in the Ith set oF storage units. The last unit allocatecl a parity block ls labeled L, and is one of the three units In the set depending on the number of stripes in the set. The Ith set contains the minimum number of storage units, three, considered desirable for implementation of the parity protection. The use of two units would be posslble, but would be similar to mirroring, with the extra step of an exclusive OR. Eight units in a set has been selected as the maximum in the preferred embodiment due to a system specific constraint to be discussed below. More than eight units may be used irl a set without loss of protection.
With a very large number oP units in a set, reconstruction of the data lost when a single unit fails would take a longer time because each unit would have to be read. There is also an increased chance of loss of more than one unit at a time. If this occurs, it ;s not possible to reconstruct the data from either of the lost unlts using the simple parity discusse-l above. The invention is consldered broad enough to cover a more complex data protection code, which may be stored similarly to the parity, and permit multiple bit correction in the event more than one storage ' .
. .
-,: .
7~33~
R09-86-Ol4 9 device fails. A set could also be arranged multidimensionally as described in the IBM TDB, Vol 24, No. 2 Pages 986-987, to permit reconstruction of clata from at leas-t two failed units. Further embodiments spread the parity information based on frequency of updat;ng data to spread the I/O activi-ty evenly, as opposed to spreading the parity itself evenly.
Each data record contains a version number. Since updates to multiple data records, covered by one parity record may occur, each record in a parity block also contains a corresponding version indication for each record in the slice it covers. A slice is a set of data records and their corresponding parity record. The version indications are not coverecl by the parity protection scheme. In Fig.
~, four data records, 410, ~12, 41~ and ~16 each contain a header wi-th a record version number indicated at ~18, ~20, 422 and 424 t respectively. A parity record 426 contains a header with four version numbers, 42~, ~30, 432 and 434 corresponding to the version numbers in the data records. The version numbers or indications may be any length compatible with the nurnber of blts available in the record headers. A one bit length was chosen due to the unavallabllity of further bits. Each time a data record is changed, its version number is incremented. The correspondlng version number in the parity record is also incrernented so that they have the same value.
The version numbers are used to check for synchronization of the parity record with each data record in a slice in the event of lost data. When changes to several data records occur, the data records may be written -to disk storage before or after the parity records are updated with the change masks. The change masks are queued for incorporation into the parity records. Assoclating a version number with each parity and data record adds a constraint that parity record updates for a given data record -to disk storage must be processed in the same order as the da-ta record updates. Otherwise, the vers~on numbers may not be accurate. A FIFO queue holds the change masks so that they are incorporated Into the Ro9-86-014 10 parity records on mass storage in the order that the change masks were generated.
Spec-ial consideration is given to the version numbers due to the limited availability of bits -in the héaders. The version number for each data record is stored in -the first bit position of the 6th byte of the respective records. The corresponding version numbers in the parity record are contained in the first 4 bit positions oF the 6th byte and tha first 3 bit positions of the gth byte o~ the header. The bit positions for the parity record header version numbers will be referred to as bits 1-7 corresponding to the order described above.
The units in a set are numbered 1-n corresponding to their order in the parity control block for the set. The version numbers are stored in ascending order, based on the unit number, in the parity record : headers, skipping the parity unit. If the third unit is the parity unit, the version numbers corresponding to the first two un;its are stored in bit positions 1 and 2. The version number corresponding to the fourth unit is stored in bit position 3. The nth unit version number is in position n-1 in the parity record header. Storing version numbers ;n this manner permits the largest possible set given the storage limitation. With extra storage, the size of the set may be optimized as a function of other system considerations. The positioning of the version numbers in the parity header could also have a straight forward unit number to bit position correspondence.
Because each data record has a version number that is independent of every other data record, no ser-ialization is required for updates to different data records covered by the same parity record. Transfer of records into and out of main storage may be based on other system through-put considerations to improve processing speed. The separate versiorl numbers also eliminate the need to read to parity records from mass storage before schedul-ing a data record write operation and queueing a parity update request.
~ .
~7~)3~3 R09-86-01.4 ll In order to limi-t the amount o-f storage occupied by a version number, it is allowed to wrap -from -the lar~es-t to the smallest value without error. This enables the ~ersion number to be reused. If a one bit version number is used, it wraps from 1 to 0. Thus, there are two values it assumes. A version number with a higher number of bits allows more values. Since data record and parity record updates are asynchronous, an update to a data record ;s held if the updated version number could be con-fused w;th an ex;st;ng vers;on number.
Such confus;on could ex;st if the same vers;on number as the vers;on number assoc;ated with the update, could exist on the data record or the par;ty record on mass storage. The t;me frame that must be cons;dered is from the time a new parity update request is placed on the FIFO queue to the t;me -the request ;s completed. Both the data record and the par;ty record must be updated before an update request is cons;dered complete.
The vers;on numbers that could ex~st on d;sk before a new update request is completed include all the values For prior update ~equests that are st;ll on the queue plus the value that precedes the f;rst (oldest) request element in the queue. If an update request is not removed from the queue unt;l ;t ;s completed, ;t ;s only necessary for a new update to wa;t ;f there are other requests on the queue for the same data record, and the incremen-ted version number for the new data record matches ~he version number preced;ng the f;rst request element st;ll ;n the queue.
The performance cost of search;ng the queue before schedul;ng an update ;s not severe as long as the number of update requests ;n the queue remains small. A reasonably fast access time to the storage used For the queue also reduces the per-Formance cost. Mainstore provides a su;table storage area For the queue. Keep;ng the number of update requests ;n the queue small ;s also important to ensure that no request has an excess;ve wa;t time.
.
~hen, as ;n the preferred embodiment, the version numbers are ;mplemented as slngle bit Flags, ;t is only necessary to search the ~7~?33~
R09-~36-0]~ 12 queue for any prior update reques-t -to the same data record ;n order to determine whether a new update must wait. There is no need to check version number values, because a new request must always wait if there is an incomplete request ln the queue for the same data block.
In the preferred embodiment, the record headers contain seven unused bits available -for implementation of the present invention.
lhis places a limit on the number of units which may participate in a set. Only seven version numbers may be kept in a parity record, so only up to eight total units participate in a parity protection set.
As mentioned before, many more units, or as little as three units could efficiently make use of the present invention.
CONFIGURATION OF SYSTEM FOR PARITY PROTECTION
.
Configuration of the system for parity protection of data is lnitiated by a user at 510 in the flow diagram of F;g. 5. A set-up task on processor 12 builds the parity control blocks 31~ - 318 in block 512 of the flow diagram. The task uses information in a configuration unit table 312 which identifies the storage units coupled to the system. A storage device, such as the IBM 3370 may have more than one independent unit in it, such as an independently :~ accessible arm, but only one unit from a storage device is chosen for any part~cular set. Another criterion for unit selection involves using as many disk controllers as possible in a set. These criteria are used so tha-t a failure mode will not affect two units in a set.
The set-up task also maximizes useable capacity by max;mizing the ~ number of units in a set.
:
: After the control blocks have been built, the stripe arrays in the control blocks are built at 514. The stripe arrays, as previously mentioned assign parlty blocks on a round robin basis to member un;ts ; in each set. The stripe arrays are formed to indicate which units con-tain the parity blocks for successive stripes compris;ng 16 megabyte blocks from each unit. A user may also ~L~7~`33;3 Ro9-86-014 l3 define a size of unprotected storage desired. A definition of the address range of the unprotected stripe is enterecl in the header of the control block. The setup task will not define a block from the unprotected stripe as a parity block, so the entire unprotected stripe will be available for data records.
I
Next, the set up task writes the control blocks at 516 to more than one member unit o~ each set to which the control blocks correspond. The control blocks are used during recovery to identify the members of the sets. Thus, they must be available without parity recovery. They are written to more than one member unit of each set in case one of the member units containing it fails. In the event that the protection scheme protects agains-t failure of more than one unit, the control blocks are written to at least one more unit than the number of units which may be recovered. In the preferred embodlment, they are written to each member of the set so that the units need not be searched to cletermine which unlt contains the control block. Now that the sets are identified, and the control blocks built, block 518 of the set-up task validates the parity blocks including the version numbers. In the preferred embodiment, this is done by zeroing all the clata on all the units. Since even parity is used for the parity protection, the result is a valid parity for all the data. The version numbers are also zero to start with.
The system is then initiali7ed in a standard manner at 520 by causing an initial program load.
It is also possible to add a member to a set which does not yet have the maximum number of units. The new member is preferably zeroed, and added to the unit table. The parity blocks of the se-t are then redistributed to include the new unit. The control block for the set ls then revised, and the units which contained parlty blocks that were transferred to the new unit have their address ranyes whlch contained the parity blocks zeroed, valldating the parity group for that stripe.
~27~3~3 R~ 86-Ol4 14 In the preferred embodimen-t, the control block is temporarily revised to ;ndica-te a change in the set will occur due to the addition oF the new un;t. The temporary change is done ;n case a Failure occurs durlng redistr;bution of the parity blocks. The parity blocks of an existing set are then redistributed to include the new unit.
The units which contained parity blocks that were transferred to the new unit have their address ranyes which contained the parity blocks ~eroed. When redistribution is complete, the changes to the control block are made permanent.
It is also possible to add a new unit without transferring parity blocks~ Not transferring the parity blocks would fail to make use of the increase in access rate of the set possible by adding the new un~t. The new unit would however be protected by the existing parity blocks.
UPDATING RECORD AND PARITY IN A SLICE
In Fig. 6, changing a data record and its corresponding parity record is shown. Such a change may be called for by a person chang;ng data in a data base, or by a machine process requesting a change for a number of reasons. At 610, a user task operating on processor 12 reads the data record to be changed. The user task first makes an extra copy of the data record at 612 and then makes the changes to the data record in a conventional manner.
,~
The user task then creates a change mask at 61~ by exclusive OR;ng the changed data record into the extra copy of the data record.
The first bit of the changed data is exclusive ORed with the first bit of the extra copy to form the f-irst bit of the change mask. Each consecutive b;t of the changed data is similarly exclusive ORed with corresponding bits of the ex-tra copy to form further bits of the change mask. An already exis-ting mach;ne instruction performs the exclus;ve OR of the headers of the changed record and the cop:ied record. Two further exclusive OR machine instructions perform exclus;ve ORs of the pages in 256 byte blocks.
~LZ7~3313 R09-86-01~ 15 The positions in the header corresponding to t~e version numbers in the parity record are then ~eroed in the change mask so that they will not affect the version numbers of the parity record.
The version number in the data record header is then incremented by the user task at block 616. A user task at 617 then determines which unit contains the appropriate parity record by search~ng the unlt table 312 based on the unit number from the physical address of the new data. The unit table indicates which control block 314-318 to use to identify the unit containing the parity record for the particular data record. Again, the address of the new data record is used to identify the stripe of interest. Once the unit containing the parity record for the stripe is identified, task 617 places an update request which includes the record address and change mask on a queue 624 for the appropriate unit. The update request also indica-tes that the data record has not yet been written. Prior to task 617 placing an update request on queue 624, it searches queue 624 to ensure there can be no confuslon with vers;on numbers as discussed previously. If confusion is possible, the user task waits until confus;on is not posslble before proceeding.
The address of the parity record is known to be the same as the address of the new data except that it is on a different unit. A
write request for the new data is issued at 618 to a queue 620. When the data record is written to storage 622, an indication is sent of that fact at 625 to the update request on queue 624. Flow is then returned to wait for the next data record change.
`:
A parity update task starts at 627 by gettlng the next parity update request from queue 624. Once the parity record identified in the update request has been read at 626~ the change mask is exclusiYe ORed at 62~ i nto the parity record. The version numbers are not changed by the exclusive OR
~ 333 because the change mask contains zeros in the bit positions corresponding to the version numbers. The exclusive OR is performed using the same exclusive OR machine instruction as is used by block 614 of the user task. The parity record version number corresponding to the data record is then incremented at 630, and a write request for the parity record is issued at 632. The parity update task then gets the next parity update request at 627.
A queue 634 stores parity record wr;te reques-ts for the storage units. ~n this case, a storage unit 636 is indicated for the particular write request. Storage unit 636 is different than unit 622 because the data and the parity records are not to be written to the same unit. When the write of the parity record completes, the update request on queue 624 is notified at 637. When both the data write and the pari~.y write are completed, the entry is removed From queue 624, indicating that there is no longer the possibllity of confusion with the version number.
RECORD RECOVERY
Recovery is performed when either a single record read error is encountered dur;ng normal operation of the system or when an entire unit fails.
~' When a unit fails, the data lost on the unit is reconstructed from the remainjng units of the set, as indicated in the control blocks stored on more than one member of the set. The failed unit is replacecl or repaired. ~ata for the new unit is then reconstructed from the remaining members of the set record by record. A parity record is reconstructed simply by reading the data records in the set and regenerating the parity. The version numbers in the parity record are set equal to the corresponding version numbers in the data records.
When regenerating a data record, a check is first made on each of the data records -to determine if their ~ersion numbers match the version numbers in the parity record for that slice. XF any of the versinn numbers in the slice do not match, a lost data indication ~7~ 33 R09-~6-014 17 is written in the header of the lost record. If the version numbers match, the records in the slice are then exc1usive ORed one by one into the new record. lhe appropriate version number is then copied from the parity record into the new record heacler.
If a read error is encountered on either a data record or a parity record, the contents of the failing record are reconstructed with the same mechanism as described above for recovery of an entire unit. In this case, i-t is necessary to hold all change activity on the slice containiny the failed record while recovery is in progress.
Following reconstruction, normal operation of the system continues, with the only data lost being that for whlch an update of data or parity was made and a unit failed before the corresponding parity or clata record could be wrltten. Thus, the vast majority of data on the failed unit was recovered without the unit redundancy overhead of mirroriny. By dlstrlbuting the parity lnformation over the members of the set as opposed to tylng up one device with the parity information, parallel operation of storage devices is utilized to provide maximum access rates.
Wh;le the invention has been descr-ibed with respect to one or more preferred embodiments~ and with respect to one particular system, it will be recognized by those skilled in the art that the invention can take many forms and shapes. The record sizes are in no way limited to those discussed, nor are the storage units limited to disk drive devices. The fact that only identical devices are used in sets is merely a ma-tter of design choice for simplification. Numerous combinations o~ storage units in sets, and the distribution of the data protection or parity blocks are within the scope of the invention as described and as claimed below.
Claims (22)
1. A data protection mechanism for a computer system having multiple independently accessible storage devices which store blocks of data, the data protection mechanism comprising:
generator means for generating parity blocks as a function of sets of data blocks, said data blocks in a set corresponding to one parity block being stored on different storage devices;
storage management means for managing the storage of data blocks and parity blocks onto the storage devices; and spreading means coupled to the storage management means for identifying a storage device to the storage management means on which each parity block is to be stored such that no one storage device contains the parity blocks for all of the groups of data blocks.
generator means for generating parity blocks as a function of sets of data blocks, said data blocks in a set corresponding to one parity block being stored on different storage devices;
storage management means for managing the storage of data blocks and parity blocks onto the storage devices; and spreading means coupled to the storage management means for identifying a storage device to the storage management means on which each parity block is to be stored such that no one storage device contains the parity blocks for all of the groups of data blocks.
2. The data protection mechanism of claim 1 wherein the spreading means substantially uniformly distributes the parity blocks to storage devices.
3. The data protection mechanism of claim 2 wherein the spreading means distributes the parity blocks in a round robin manner.
4. The data protection mechanism of claim 1 wherein each data block in a set, and its corresponding parity block form a stripe of the same address ranges of each of the storage devices.
5. The data protection mechanism of claim 4 wherein at least one stripe of same addresses comprises data blocks without a parity block.
6. The data protection mechanism of claim 4 wherein there are at least as many stripes as there are blocks of data in a group plus one for the parity block.
7. The data protection mechanism of claim 4 wherein the size of the address range is at least as large as the largest system allocatable contiguous piece of storage.
8. The data protection mechanism of claim 1 wherein each storage device has the same address size.
9. The data protection mechanism of claim 1 wherein a set comprises at least three storage devices.
10. The data protection mechanism of claim 1 wherein the storage devices in a set are selected to minimize the number of storage devices affected by a failure of a single component of the computer system.
11. The data protection mechanism of claim 1 wherein a block comprises a plurality of predetermined sized records, the mechanism further comprising version generator means for providing independent version numbers to data records having the same address in a set and corresponding version numbers to the parity record covering such set of data records.
12. The data protection mechanism of claim 11 wherein the version numbers comprise counters, and the version generator means increments the counter in a revised data record and increments the corresponding counter in the parity record.
13. The data protection mechanism of claim 12 and further comprising data recovery means for recovering records lost on a failed storage device by combining the remaining records in the set of storage devices.
14. The data protection mechanism of claim 13 wherein the recovery of a lost record from a device is contingent on each remaining record in the one set of records having version numbers matching the versions numbers in the parity record.
15. The data protection mechanism of claim 1 and further comprising change mask means for producing a change mask for a parity record when a data record in the set is to be changed, said change mask being generated as a function of the original data record and the changed data record.
16. A method of protecting data stored on a plurality of memory devices comprising:
dividing addressable memory on each of said memory devices into blocks of memory such that each memory has the same number of blocks, the blocks on each memory having the same address range comprising a stripe;
storing parity information for each stripe of memory blocks in a distributed manner across the memory devices to enhance the overall access rate of the memory devices; and changing the parity information for each stripe as a function of a change to a block in its corresponding stripe without reading all the blocks in the stripe.
dividing addressable memory on each of said memory devices into blocks of memory such that each memory has the same number of blocks, the blocks on each memory having the same address range comprising a stripe;
storing parity information for each stripe of memory blocks in a distributed manner across the memory devices to enhance the overall access rate of the memory devices; and changing the parity information for each stripe as a function of a change to a block in its corresponding stripe without reading all the blocks in the stripe.
17. The method of claim 16 wherein the step of changing the parity information comprises the steps of:
reading the data to be changed;
making a copy of the data to be changed;
making the changes to the data;
generating a change mask as a function of the copy of the data to be changed and the changed data;
writing the changed data to memory;
reading the corresponding parity data;
applying the change mask to the parity data to update it;
and writing the updated parity data to memory.
reading the data to be changed;
making a copy of the data to be changed;
making the changes to the data;
generating a change mask as a function of the copy of the data to be changed and the changed data;
writing the changed data to memory;
reading the corresponding parity data;
applying the change mask to the parity data to update it;
and writing the updated parity data to memory.
18. The method of claim 16 wherein the step of changing the parity information further comprises the step of generating a version number which is stored with both the block of data and the parity block.
19. A data protection mechanism for a computer system having multiple independently accessible storage devices which store records of data, the data protection mechanism comprising:
generator means for generating parity records as a function of sets of data records, said data records in a set corresponding to one parity record, each of said data and parity records being stored on different storage devices;
storage management means for managing the storage of data records and parity records onto the storage devices; and version generator means for providing independent version numbers to data records in a set and corresponding version numbers to the parity record covering such set of data records.
generator means for generating parity records as a function of sets of data records, said data records in a set corresponding to one parity record, each of said data and parity records being stored on different storage devices;
storage management means for managing the storage of data records and parity records onto the storage devices; and version generator means for providing independent version numbers to data records in a set and corresponding version numbers to the parity record covering such set of data records.
20. The data protection mechanism of claim 19 wherein the version numbers comprise a 1 bit counter in the data records and a number of one bit counters in the parity record equal to the number of data records in the set.
21. The data protection mechanism of claim 19 wherein reconstruction of a record in a set is conditional upon the version numbers in the remaining records being consistent with the corresponding version numbers in the parity record.
22. The data protection mechanism of claim 19 wherein the version numbers comprise a multibit counter in the data records and a number of multibit counters in the parity record equal to the number of data records in the set, wherein said counters count in a predetermined sequence of values.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US873,249 | 1986-06-12 | ||
US06873249 US4761785B1 (en) | 1986-06-12 | 1986-06-12 | Parity spreading to enhance storage access |
Publications (1)
Publication Number | Publication Date |
---|---|
CA1270333A true CA1270333A (en) | 1990-06-12 |
Family
ID=25361257
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA000535598A Expired - Lifetime CA1270333A (en) | 1986-06-12 | 1987-04-27 | Parity spreading to enhance storge access |
Country Status (5)
Country | Link |
---|---|
US (1) | US4761785B1 (en) |
EP (1) | EP0249091B1 (en) |
JP (1) | JPS62293355A (en) |
CA (1) | CA1270333A (en) |
DE (1) | DE3750790T2 (en) |
Families Citing this family (314)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4949326A (en) * | 1986-12-10 | 1990-08-14 | Matsushita Electric Industrial Co., Ltd. | Optical information recording and reproducing system using optical disks having an error correction function |
USRE34100E (en) * | 1987-01-12 | 1992-10-13 | Seagate Technology, Inc. | Data error correction system |
US5257367A (en) * | 1987-06-02 | 1993-10-26 | Cab-Tek, Inc. | Data storage system with asynchronous host operating system communication link |
US4942579A (en) * | 1987-06-02 | 1990-07-17 | Cab-Tek, Inc. | High-speed, high-capacity, fault-tolerant error-correcting storage system |
US4870643A (en) * | 1987-11-06 | 1989-09-26 | Micropolis Corporation | Parallel drive array storage system |
US4993030A (en) * | 1988-04-22 | 1991-02-12 | Amdahl Corporation | File system for a plurality of storage classes |
US4914656A (en) * | 1988-06-28 | 1990-04-03 | Storage Technology Corporation | Disk drive memory |
US4989206A (en) * | 1988-06-28 | 1991-01-29 | Storage Technology Corporation | Disk drive memory |
US5077736A (en) * | 1988-06-28 | 1991-12-31 | Storage Technology Corporation | Disk drive memory |
US5283791A (en) * | 1988-08-02 | 1994-02-01 | Cray Research Systems, Inc. | Error recovery method and apparatus for high performance disk drives |
US5218689A (en) * | 1988-08-16 | 1993-06-08 | Cray Research, Inc. | Single disk emulation interface for an array of asynchronously operating disk drives |
JP2718708B2 (en) * | 1988-08-26 | 1998-02-25 | 株式会社日立製作所 | Control method for storage control system, storage control system, and storage control device |
US5148432A (en) * | 1988-11-14 | 1992-09-15 | Array Technology Corporation | Arrayed disk drive system and method |
US5007053A (en) * | 1988-11-30 | 1991-04-09 | International Business Machines Corporation | Method and apparatus for checksum address generation in a fail-safe modular memory |
US5008886A (en) * | 1989-01-27 | 1991-04-16 | Digital Equipment Corporation | Read-modify-write operation |
US5185746A (en) * | 1989-04-14 | 1993-02-09 | Mitsubishi Denki Kabushiki Kaisha | Optical recording system with error correction and data recording distributed across multiple disk drives |
US5146574A (en) * | 1989-06-27 | 1992-09-08 | Sf2 Corporation | Method and circuit for programmable selecting a variable sequence of element using write-back |
US5072378A (en) * | 1989-12-18 | 1991-12-10 | Storage Technology Corporation | Direct access storage device with independently stored parity |
US5402428A (en) * | 1989-12-25 | 1995-03-28 | Hitachi, Ltd. | Array disk subsystem |
JPH03216751A (en) * | 1990-01-05 | 1991-09-24 | Internatl Business Mach Corp <Ibm> | Method of transferring file |
US5315708A (en) * | 1990-02-28 | 1994-05-24 | Micro Technology, Inc. | Method and apparatus for transferring data through a staging memory |
US5140592A (en) * | 1990-03-02 | 1992-08-18 | Sf2 Corporation | Disk array system |
US5195100A (en) * | 1990-03-02 | 1993-03-16 | Micro Technology, Inc. | Non-volatile memory storage of write operation identifier in data sotrage device |
US5134619A (en) * | 1990-04-06 | 1992-07-28 | Sf2 Corporation | Failure-tolerant mass storage system |
US5212785A (en) * | 1990-04-06 | 1993-05-18 | Micro Technology, Inc. | Apparatus and method for controlling data flow between a computer and memory devices |
US5233618A (en) * | 1990-03-02 | 1993-08-03 | Micro Technology, Inc. | Data correcting applicable to redundant arrays of independent disks |
US5388243A (en) * | 1990-03-09 | 1995-02-07 | Mti Technology Corporation | Multi-sort mass storage device announcing its active paths without deactivating its ports in a network architecture |
US5129082A (en) * | 1990-03-27 | 1992-07-07 | Sun Microsystems, Inc. | Method and apparatus for searching database component files to retrieve information from modified files |
US5325497A (en) * | 1990-03-29 | 1994-06-28 | Micro Technology, Inc. | Method and apparatus for assigning signatures to identify members of a set of mass of storage devices |
US5202856A (en) * | 1990-04-05 | 1993-04-13 | Micro Technology, Inc. | Method and apparatus for simultaneous, interleaved access of multiple memories by multiple ports |
US5233692A (en) * | 1990-04-06 | 1993-08-03 | Micro Technology, Inc. | Enhanced interface permitting multiple-byte parallel transfers of control information and data on a small computer system interface (SCSI) communication bus and a mass storage system incorporating the enhanced interface |
US5956524A (en) * | 1990-04-06 | 1999-09-21 | Micro Technology Inc. | System and method for dynamic alignment of associated portions of a code word from a plurality of asynchronous sources |
US5414818A (en) * | 1990-04-06 | 1995-05-09 | Mti Technology Corporation | Method and apparatus for controlling reselection of a bus by overriding a prioritization protocol |
US5214778A (en) * | 1990-04-06 | 1993-05-25 | Micro Technology, Inc. | Resource management in a multiple resource system |
US5130992A (en) * | 1990-04-16 | 1992-07-14 | International Business Machines Corporaiton | File-based redundant parity protection in a parallel computing system |
US5263145A (en) * | 1990-05-24 | 1993-11-16 | International Business Machines Corporation | Method and means for accessing DASD arrays with tuned data transfer rate and concurrency |
JPH0731582B2 (en) * | 1990-06-21 | 1995-04-10 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Method and apparatus for recovering parity protected data |
US5220569A (en) * | 1990-07-09 | 1993-06-15 | Seagate Technology, Inc. | Disk array with error type indication and selection of error correction method |
US5265098A (en) * | 1990-08-03 | 1993-11-23 | International Business Machines Corporation | Method and means for managing DASD array accesses when operating in degraded mode |
US5375128A (en) * | 1990-10-18 | 1994-12-20 | Ibm Corporation (International Business Machines Corporation) | Fast updating of DASD arrays using selective shadow writing of parity and data blocks, tracks, or cylinders |
EP0481735A3 (en) * | 1990-10-19 | 1993-01-13 | Array Technology Corporation | Address protection circuit |
US5208813A (en) * | 1990-10-23 | 1993-05-04 | Array Technology Corporation | On-line reconstruction of a failed redundant array system |
AU8683991A (en) * | 1990-11-09 | 1992-05-14 | Array Technology Corporation | Logical partitioning of a redundant array storage system |
US5235601A (en) * | 1990-12-21 | 1993-08-10 | Array Technology Corporation | On-line restoration of redundancy information in a redundant array system |
US5274799A (en) * | 1991-01-04 | 1993-12-28 | Array Technology Corporation | Storage device array architecture with copyback cache |
US6874101B2 (en) * | 1991-01-31 | 2005-03-29 | Hitachi, Ltd. | Storage unit subsystem |
JP3409859B2 (en) * | 1991-01-31 | 2003-05-26 | 株式会社日立製作所 | Control method of control device |
US5239640A (en) * | 1991-02-01 | 1993-08-24 | International Business Machines Corporation | Data storage system and method including data and checksum write staging storage |
US5303244A (en) * | 1991-03-01 | 1994-04-12 | Teradata | Fault tolerant disk drive matrix |
US5257362A (en) * | 1991-03-08 | 1993-10-26 | International Business Machines Corporation | Method and means for ensuring single pass small read/write access to variable length records stored on selected DASDs in a DASD array |
US5345565A (en) * | 1991-03-13 | 1994-09-06 | Ncr Corporation | Multiple configuration data path architecture for a disk array controller |
US5506979A (en) * | 1991-04-02 | 1996-04-09 | International Business Machines Corporation | Method and means for execution of commands accessing variable length records stored on fixed block formatted DASDS of an N+2 DASD synchronous array |
JP2743606B2 (en) * | 1991-04-11 | 1998-04-22 | 三菱電機株式会社 | Array type recording device |
JP3187525B2 (en) * | 1991-05-17 | 2001-07-11 | ヒュンダイ エレクトロニクス アメリカ | Bus connection device |
US5278838A (en) * | 1991-06-18 | 1994-01-11 | Ibm Corp. | Recovery from errors in a redundant array of disk drives |
US5333143A (en) * | 1991-08-29 | 1994-07-26 | International Business Machines Corporation | Method and means for b-adjacent coding and rebuilding data from up to two unavailable DASDS in a DASD array |
US5636358A (en) * | 1991-09-27 | 1997-06-03 | Emc Corporation | Method and apparatus for transferring data in a storage device including a dual-port buffer |
US5499337A (en) | 1991-09-27 | 1996-03-12 | Emc Corporation | Storage device array architecture with solid-state redundancy unit |
US5237658A (en) * | 1991-10-01 | 1993-08-17 | Tandem Computers Incorporated | Linear and orthogonal expansion of array storage in multiprocessor computing systems |
US5379417A (en) * | 1991-11-25 | 1995-01-03 | Tandem Computers Incorporated | System and method for ensuring write data integrity in a redundant array data storage system |
CA2126754A1 (en) * | 1991-12-27 | 1993-07-08 | E. David Neufeld | Method for performing disk array operations using a nonuniform stripe size mapping scheme |
US5333305A (en) * | 1991-12-27 | 1994-07-26 | Compaq Computer Corporation | Method for improving partial stripe write performance in disk array subsystems |
EP0551009B1 (en) * | 1992-01-08 | 2001-06-13 | Emc Corporation | Method for synchronizing reserved areas in a redundant storage array |
US5341381A (en) * | 1992-01-21 | 1994-08-23 | Tandem Computers, Incorporated | Redundant array parity caching system |
US5371743A (en) * | 1992-03-06 | 1994-12-06 | Data General Corporation | On-line module replacement in a multiple module data processing system |
DE69320388T2 (en) * | 1992-03-06 | 1999-05-12 | Data General Corp | Data handling in a system with a processor to control access to a plurality of data storage disks |
US5305326A (en) * | 1992-03-06 | 1994-04-19 | Data General Corporation | High availability disk arrays |
AU653670B2 (en) * | 1992-03-10 | 1994-10-06 | Data General Corporation | Improvements for high availability disk arrays |
US5469566A (en) * | 1992-03-12 | 1995-11-21 | Emc Corporation | Flexible parity generation circuit for intermittently generating a parity for a plurality of data channels in a redundant array of storage units |
WO1993018456A1 (en) * | 1992-03-13 | 1993-09-16 | Emc Corporation | Multiple controller sharing in a redundant storage array |
US5740465A (en) * | 1992-04-08 | 1998-04-14 | Hitachi, Ltd. | Array disk controller for grouping host commands into a single virtual host command |
US5418921A (en) * | 1992-05-05 | 1995-05-23 | International Business Machines Corporation | Method and means for fast writing data to LRU cached based DASD arrays under diverse fault tolerant modes |
US5708668A (en) * | 1992-05-06 | 1998-01-13 | International Business Machines Corporation | Method and apparatus for operating an array of storage devices |
JP2888401B2 (en) * | 1992-08-03 | 1999-05-10 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Synchronization method for redundant disk drive arrays |
US6640235B1 (en) | 1992-08-20 | 2003-10-28 | Intel Corporation | Expandable mass disk drive storage system |
US5913926A (en) * | 1992-08-20 | 1999-06-22 | Farrington Investments Ltd. | Expandable modular data storage system having parity storage capability |
JP3183719B2 (en) * | 1992-08-26 | 2001-07-09 | 三菱電機株式会社 | Array type recording device |
JP3181398B2 (en) * | 1992-10-06 | 2001-07-03 | 三菱電機株式会社 | Array type recording device |
EP0600137A1 (en) * | 1992-11-30 | 1994-06-08 | International Business Machines Corporation | Method and apparatus for correcting errors in a memory |
US5579474A (en) * | 1992-12-28 | 1996-11-26 | Hitachi, Ltd. | Disk array system and its control method |
JP3176157B2 (en) * | 1992-12-28 | 2001-06-11 | 株式会社日立製作所 | Disk array device and data updating method thereof |
JP2743756B2 (en) * | 1993-02-03 | 1998-04-22 | 日本電気株式会社 | Semiconductor disk device |
JP3258117B2 (en) * | 1993-03-08 | 2002-02-18 | 株式会社日立製作所 | Storage subsystem |
US5649162A (en) * | 1993-05-24 | 1997-07-15 | Micron Electronics, Inc. | Local bus interface |
US5867640A (en) * | 1993-06-01 | 1999-02-02 | Mti Technology Corp. | Apparatus and method for improving write-throughput in a redundant array of mass storage devices |
US6138126A (en) * | 1995-05-31 | 2000-10-24 | Network Appliance, Inc. | Method for allocating files in a file system integrated with a raid disk sub-system |
ATE409907T1 (en) * | 1993-06-03 | 2008-10-15 | Network Appliance Inc | METHOD AND DEVICE FOR DESCRIBING ANY AREAS OF A FILE SYSTEM |
US7174352B2 (en) | 1993-06-03 | 2007-02-06 | Network Appliance, Inc. | File system image transfer |
US6604118B2 (en) | 1998-07-31 | 2003-08-05 | Network Appliance, Inc. | File system image transfer |
US5963962A (en) * | 1995-05-31 | 1999-10-05 | Network Appliance, Inc. | Write anywhere file-system layout |
DE69431186T2 (en) * | 1993-06-03 | 2003-05-08 | Network Appliance Inc | Method and file system for assigning file blocks to storage space in a RAID disk system |
JPH08511368A (en) | 1993-06-04 | 1996-11-26 | ネットワーク・アプリアンス・コーポレーション | Method for forming parity in RAID subsystem using non-volatile memory |
US5555389A (en) * | 1993-07-07 | 1996-09-10 | Hitachi, Ltd. | Storage controller for performing dump processing |
US5987622A (en) * | 1993-12-10 | 1999-11-16 | Tm Patents, Lp | Parallel computer system including parallel storage subsystem including facility for correction of data in the event of failure of a storage device in parallel storage subsystem |
US5396620A (en) * | 1993-12-21 | 1995-03-07 | Storage Technology Corporation | Method for writing specific values last into data storage groups containing redundancy |
US20030088611A1 (en) * | 1994-01-19 | 2003-05-08 | Mti Technology Corporation | Systems and methods for dynamic alignment of associated portions of a code word from a plurality of asynchronous sources |
US5911150A (en) * | 1994-01-25 | 1999-06-08 | Data General Corporation | Data storage tape back-up for data processing systems using a single driver interface unit |
US5446855A (en) * | 1994-02-07 | 1995-08-29 | Buslogic, Inc. | System and method for disk array data transfer |
US5537567A (en) * | 1994-03-14 | 1996-07-16 | International Business Machines Corporation | Parity block configuration in an array of storage devices |
JP2981711B2 (en) * | 1994-06-16 | 1999-11-22 | 日本アイ・ビー・エム株式会社 | Disk storage device |
US5467361A (en) * | 1994-06-20 | 1995-11-14 | International Business Machines Corporation | Method and system for separate data and media maintenance within direct access storage devices |
US5657439A (en) * | 1994-08-23 | 1997-08-12 | International Business Machines Corporation | Distributed subsystem sparing |
US5412668A (en) * | 1994-09-22 | 1995-05-02 | International Business Machines Corporation | Parity striping feature for optical disks |
US5623595A (en) * | 1994-09-26 | 1997-04-22 | Oracle Corporation | Method and apparatus for transparent, real time reconstruction of corrupted data in a redundant array data storage system |
GB2293912A (en) * | 1994-10-05 | 1996-04-10 | Ibm | Disk storage device for disk array |
US5497457A (en) * | 1994-10-17 | 1996-03-05 | International Business Machines Corporation | Redundant arrays of independent libraries of dismountable media with parity logging |
US5488701A (en) * | 1994-11-17 | 1996-01-30 | International Business Machines Corporation | In log sparing for log structured arrays |
US5574882A (en) * | 1995-03-03 | 1996-11-12 | International Business Machines Corporation | System and method for identifying inconsistent parity in an array of storage |
US5848230A (en) | 1995-05-25 | 1998-12-08 | Tandem Computers Incorporated | Continuously available computer memory systems |
KR100300836B1 (en) * | 1995-06-08 | 2001-09-03 | 포만 제프리 엘 | Data reconstruction method and data storage system |
US5875456A (en) * | 1995-08-17 | 1999-02-23 | Nstor Corporation | Storage device array and methods for striping and unstriping data and for adding and removing disks online to/from a raid storage array |
US5657468A (en) * | 1995-08-17 | 1997-08-12 | Ambex Technologies, Inc. | Method and apparatus for improving performance in a reduntant array of independent disks |
WO1997011426A1 (en) | 1995-09-18 | 1997-03-27 | Cyberstorage Systems, Inc. | Universal storage management system |
US5799200A (en) * | 1995-09-28 | 1998-08-25 | Emc Corporation | Power failure responsive apparatus and method having a shadow dram, a flash ROM, an auxiliary battery, and a controller |
US5941994A (en) * | 1995-12-22 | 1999-08-24 | Lsi Logic Corporation | Technique for sharing hot spare drives among multiple subsystems |
US5838892A (en) * | 1995-12-29 | 1998-11-17 | Emc Corporation | Method and apparatus for calculating an error detecting code block in a disk drive controller |
US6055577A (en) * | 1996-05-06 | 2000-04-25 | Oracle Corporation | System for granting bandwidth for real time processes and assigning bandwidth for non-real time processes while being forced to periodically re-arbitrate for new assigned bandwidth |
US5790774A (en) * | 1996-05-21 | 1998-08-04 | Storage Computer Corporation | Data storage system with dedicated allocation of parity storage and parity reads and writes only on operations requiring parity information |
US5856989A (en) * | 1996-08-13 | 1999-01-05 | Hewlett-Packard Company | Method and apparatus for parity block generation |
US6041423A (en) * | 1996-11-08 | 2000-03-21 | Oracle Corporation | Method and apparatus for using undo/redo logging to perform asynchronous updates of parity and data pages in a redundant array data storage environment |
US6161165A (en) * | 1996-11-14 | 2000-12-12 | Emc Corporation | High performance data path with XOR on the fly |
KR100223186B1 (en) * | 1997-01-29 | 1999-10-15 | 윤종용 | Data recording method in dvd-ram |
JPH10254642A (en) * | 1997-03-14 | 1998-09-25 | Hitachi Ltd | Storage device system |
JP4499193B2 (en) * | 1997-04-07 | 2010-07-07 | ソニー株式会社 | Recording / reproducing apparatus and recording / reproducing method |
US5974503A (en) * | 1997-04-25 | 1999-10-26 | Emc Corporation | Storage and access of continuous media files indexed as lists of raid stripe sets associated with file names |
US5968182A (en) * | 1997-05-12 | 1999-10-19 | International Business Machines Corporation | Method and means for utilizing device long busy response for resolving detected anomalies at the lowest level in a hierarchical, demand/response storage management subsystem |
US5991894A (en) * | 1997-06-06 | 1999-11-23 | The Chinese University Of Hong Kong | Progressive redundancy transmission |
US6016552A (en) * | 1997-06-06 | 2000-01-18 | The Chinese University Of Hong Kong | Object striping focusing on data object |
US6112277A (en) * | 1997-09-25 | 2000-08-29 | International Business Machines Corporation | Method and means for reducing device contention by random accessing and partial track staging of records according to a first DASD format but device mapped according to a second DASD format |
CN1281560A (en) | 1997-10-08 | 2001-01-24 | 西加特技术有限责任公司 | Hybrid data storage and reconstruction system and method for data storage device |
US6112255A (en) * | 1997-11-13 | 2000-08-29 | International Business Machines Corporation | Method and means for managing disk drive level logic and buffer modified access paths for enhanced raid array data rebuild and write update operations |
US6101624A (en) * | 1998-01-21 | 2000-08-08 | International Business Machines Corporation | Method and apparatus for detecting and correcting anomalies in field-programmable gate arrays using CRCs for anomaly detection and parity for anomaly correction |
US6457130B2 (en) | 1998-03-03 | 2002-09-24 | Network Appliance, Inc. | File access control in a multi-protocol file server |
US6317844B1 (en) | 1998-03-10 | 2001-11-13 | Network Appliance, Inc. | File server storage arrangement |
DE19811035A1 (en) * | 1998-03-13 | 1999-09-16 | Grau Software Gmbh | Data storage method for data sequences |
US6219751B1 (en) | 1998-04-28 | 2001-04-17 | International Business Machines Corporation | Device level coordination of access operations among multiple raid control units |
US6704837B2 (en) | 1998-06-29 | 2004-03-09 | International Business Machines Corporation | Method and apparatus for increasing RAID write performance by maintaining a full track write counter |
US6427212B1 (en) | 1998-11-13 | 2002-07-30 | Tricord Systems, Inc. | Data fault tolerance software apparatus and method |
US6343984B1 (en) | 1998-11-30 | 2002-02-05 | Network Appliance, Inc. | Laminar flow duct cooling system |
US6725392B1 (en) | 1999-03-03 | 2004-04-20 | Adaptec, Inc. | Controller fault recovery system for a distributed file system |
US6449731B1 (en) | 1999-03-03 | 2002-09-10 | Tricord Systems, Inc. | Self-healing computer system storage |
US6530036B1 (en) | 1999-08-17 | 2003-03-04 | Tricord Systems, Inc. | Self-healing computer system storage |
US6970450B1 (en) * | 1999-10-29 | 2005-11-29 | Array Telecom Corporation | System, method and computer program product for point-to-point bandwidth conservation in an IP network |
JP2001166887A (en) * | 1999-12-08 | 2001-06-22 | Sony Corp | Data recording and reproducing device and data recording and reproducing method |
US7509420B2 (en) | 2000-02-18 | 2009-03-24 | Emc Corporation | System and method for intelligent, globally distributed network storage |
US7194504B2 (en) * | 2000-02-18 | 2007-03-20 | Avamar Technologies, Inc. | System and method for representing and maintaining redundant data sets utilizing DNA transmission and transcription techniques |
US7062648B2 (en) * | 2000-02-18 | 2006-06-13 | Avamar Technologies, Inc. | System and method for redundant array network storage |
US6826711B2 (en) | 2000-02-18 | 2004-11-30 | Avamar Technologies, Inc. | System and method for data protection with multidimensional parity |
US6704730B2 (en) | 2000-02-18 | 2004-03-09 | Avamar Technologies, Inc. | Hash file system and method for use in a commonality factoring system |
US6820088B1 (en) * | 2000-04-10 | 2004-11-16 | Research In Motion Limited | System and method for synchronizing data records between multiple databases |
US6728922B1 (en) | 2000-08-18 | 2004-04-27 | Network Appliance, Inc. | Dynamic data space |
US7072916B1 (en) | 2000-08-18 | 2006-07-04 | Network Appliance, Inc. | Instant snapshot |
US6636879B1 (en) * | 2000-08-18 | 2003-10-21 | Network Appliance, Inc. | Space allocation in a write anywhere file system |
US6611852B1 (en) | 2000-09-29 | 2003-08-26 | Emc Corporation | System and method for cleaning a log structure |
US6507890B1 (en) | 2000-09-29 | 2003-01-14 | Emc Corporation | System and method for expanding a log structure in a disk array |
US6865650B1 (en) | 2000-09-29 | 2005-03-08 | Emc Corporation | System and method for hierarchical data storage |
US6654912B1 (en) * | 2000-10-04 | 2003-11-25 | Network Appliance, Inc. | Recovery of file system data in file servers mirrored file system volumes |
US6952797B1 (en) | 2000-10-25 | 2005-10-04 | Andy Kahn | Block-appended checksums |
US6810398B2 (en) | 2000-11-06 | 2004-10-26 | Avamar Technologies, Inc. | System and method for unorchestrated determination of data sequences using sticky byte factoring to determine breakpoints in digital sequences |
US6650601B1 (en) | 2001-04-26 | 2003-11-18 | International Business Machines Corporation | Hard disk drive picking device and method |
US6600703B1 (en) | 2001-04-26 | 2003-07-29 | International Business Machines Corporation | Magazine for a plurality of removable hard disk drives |
US6512962B2 (en) | 2001-04-26 | 2003-01-28 | International Business Machines Corporation | Cabling picker in a library of stationary memory devices |
US6941260B2 (en) * | 2001-04-26 | 2005-09-06 | International Business Machines Corporation | Method and apparatus for emulating a fiber channel port |
US6754768B2 (en) | 2001-04-26 | 2004-06-22 | International Business Machines Corporation | Library of hard disk drives with transparent emulating interface |
US6871263B2 (en) * | 2001-08-28 | 2005-03-22 | Sedna Patent Services, Llc | Method and apparatus for striping data onto a plurality of disk drives |
US6851082B1 (en) | 2001-11-13 | 2005-02-01 | Network Appliance, Inc. | Concentrated parity technique for handling double failures and enabling storage of more than one parity block per stripe on a storage device of a storage array |
US7346831B1 (en) | 2001-11-13 | 2008-03-18 | Network Appliance, Inc. | Parity assignment technique for parity declustering in a parity array of a storage system |
US6978283B1 (en) * | 2001-12-21 | 2005-12-20 | Network Appliance, Inc. | File system defragmentation technique via write allocation |
US7073115B2 (en) * | 2001-12-28 | 2006-07-04 | Network Appliance, Inc. | Correcting multiple block data loss in a storage array using a combination of a single diagonal parity group and multiple row parity groups |
US7613984B2 (en) * | 2001-12-28 | 2009-11-03 | Netapp, Inc. | System and method for symmetric triple parity for failing storage devices |
US7640484B2 (en) | 2001-12-28 | 2009-12-29 | Netapp, Inc. | Triple parity technique for enabling efficient recovery from triple failures in a storage array |
US6993701B2 (en) | 2001-12-28 | 2006-01-31 | Network Appliance, Inc. | Row-diagonal parity technique for enabling efficient recovery from double failures in a storage array |
US8402346B2 (en) * | 2001-12-28 | 2013-03-19 | Netapp, Inc. | N-way parity technique for enabling recovery from up to N storage device failures |
US7007220B2 (en) * | 2002-03-01 | 2006-02-28 | Broadlogic Network Technologies, Inc. | Error correction coding across multiple channels in content distribution systems |
US7080278B1 (en) | 2002-03-08 | 2006-07-18 | Network Appliance, Inc. | Technique for correcting multiple storage device failures in a storage array |
US6993539B2 (en) | 2002-03-19 | 2006-01-31 | Network Appliance, Inc. | System and method for determining changes in two snapshots and for transmitting changes to destination snapshot |
US7200715B2 (en) | 2002-03-21 | 2007-04-03 | Network Appliance, Inc. | Method for writing contiguous arrays of stripes in a RAID storage system using mapped block writes |
US7254813B2 (en) * | 2002-03-21 | 2007-08-07 | Network Appliance, Inc. | Method and apparatus for resource allocation in a raid system |
US7539991B2 (en) | 2002-03-21 | 2009-05-26 | Netapp, Inc. | Method and apparatus for decomposing I/O tasks in a raid system |
US7437727B2 (en) * | 2002-03-21 | 2008-10-14 | Network Appliance, Inc. | Method and apparatus for runtime resource deadlock avoidance in a raid system |
US6976146B1 (en) | 2002-05-21 | 2005-12-13 | Network Appliance, Inc. | System and method for emulating block appended checksums on storage devices by sector stealing |
US7024586B2 (en) * | 2002-06-24 | 2006-04-04 | Network Appliance, Inc. | Using file system information in raid data reconstruction and migration |
US7873700B2 (en) * | 2002-08-09 | 2011-01-18 | Netapp, Inc. | Multi-protocol storage appliance that provides integrated support for file and block access protocols |
US7340486B1 (en) * | 2002-10-10 | 2008-03-04 | Network Appliance, Inc. | System and method for file system snapshot of a virtual logical disk |
US7085953B1 (en) | 2002-11-01 | 2006-08-01 | International Business Machines Corporation | Method and means for tolerating multiple dependent or arbitrary double disk failures in a disk array |
US7809693B2 (en) * | 2003-02-10 | 2010-10-05 | Netapp, Inc. | System and method for restoring data on demand for instant volume restoration |
US7185144B2 (en) * | 2003-11-24 | 2007-02-27 | Network Appliance, Inc. | Semi-static distribution technique |
US7111147B1 (en) | 2003-03-21 | 2006-09-19 | Network Appliance, Inc. | Location-independent RAID group virtual block management |
US7328364B1 (en) | 2003-03-21 | 2008-02-05 | Network Appliance, Inc. | Technique for coherent suspension of I/O operations in a RAID subsystem |
US7143235B1 (en) | 2003-03-21 | 2006-11-28 | Network Appliance, Inc. | Proposed configuration management behaviors in a raid subsystem |
US7664913B2 (en) * | 2003-03-21 | 2010-02-16 | Netapp, Inc. | Query-based spares management technique |
US7424637B1 (en) | 2003-03-21 | 2008-09-09 | Networks Appliance, Inc. | Technique for managing addition of disks to a volume of a storage system |
US7171606B2 (en) * | 2003-03-25 | 2007-01-30 | Wegener Communications, Inc. | Software download control system, apparatus and method |
US7275179B1 (en) | 2003-04-24 | 2007-09-25 | Network Appliance, Inc. | System and method for reducing unrecoverable media errors in a disk subsystem |
US7437523B1 (en) | 2003-04-25 | 2008-10-14 | Network Appliance, Inc. | System and method for on-the-fly file folding in a replicated storage system |
US7174476B2 (en) * | 2003-04-28 | 2007-02-06 | Lsi Logic Corporation | Methods and structure for improved fault tolerance during initialization of a RAID logical unit |
US20040250028A1 (en) * | 2003-06-09 | 2004-12-09 | Daniels Rodger D. | Method and apparatus for data version checking |
US7206411B2 (en) | 2003-06-25 | 2007-04-17 | Wegener Communications, Inc. | Rapid decryption of data by key synchronization and indexing |
US7146461B1 (en) | 2003-07-01 | 2006-12-05 | Veritas Operating Corporation | Automated recovery from data corruption of data volumes in parity RAID storage systems |
US7047379B2 (en) * | 2003-07-11 | 2006-05-16 | International Business Machines Corporation | Autonomic link optimization through elimination of unnecessary transfers |
US7328305B2 (en) | 2003-11-03 | 2008-02-05 | Network Appliance, Inc. | Dynamic parity distribution technique |
US7783611B1 (en) | 2003-11-10 | 2010-08-24 | Netapp, Inc. | System and method for managing file metadata during consistency points |
US7401093B1 (en) | 2003-11-10 | 2008-07-15 | Network Appliance, Inc. | System and method for managing file data during consistency points |
US7721062B1 (en) | 2003-11-10 | 2010-05-18 | Netapp, Inc. | Method for detecting leaked buffer writes across file system consistency points |
US7428691B2 (en) * | 2003-11-12 | 2008-09-23 | Norman Ken Ouchi | Data recovery from multiple failed data blocks and storage units |
US7647451B1 (en) | 2003-11-24 | 2010-01-12 | Netapp, Inc. | Data placement technique for striping data containers across volumes of a storage system cluster |
US7263629B2 (en) * | 2003-11-24 | 2007-08-28 | Network Appliance, Inc. | Uniform and symmetric double failure correcting technique for protecting against two disk failures in a disk array |
US7366837B2 (en) * | 2003-11-24 | 2008-04-29 | Network Appliance, Inc. | Data placement technique for striping data containers across volumes of a storage system cluster |
JP5166735B2 (en) * | 2003-12-19 | 2013-03-21 | ネットアップ,インコーポレイテッド | System and method capable of synchronous data replication in a very short update interval |
US7478101B1 (en) | 2003-12-23 | 2009-01-13 | Networks Appliance, Inc. | System-independent data format in a mirrored storage system environment and method for using the same |
CA2552019A1 (en) * | 2003-12-29 | 2005-07-21 | Sherwood Information Partners, Inc. | System and method for reduced vibration interaction in a multiple-hard-disk-drive enclosure |
US8041888B2 (en) * | 2004-02-05 | 2011-10-18 | Netapp, Inc. | System and method for LUN cloning |
US7409494B2 (en) * | 2004-04-30 | 2008-08-05 | Network Appliance, Inc. | Extension of write anywhere file system layout |
US7409511B2 (en) * | 2004-04-30 | 2008-08-05 | Network Appliance, Inc. | Cloning technique for efficiently creating a copy of a volume in a storage system |
US7334094B2 (en) * | 2004-04-30 | 2008-02-19 | Network Appliance, Inc. | Online clone volume splitting technique |
US7334095B1 (en) | 2004-04-30 | 2008-02-19 | Network Appliance, Inc. | Writable clone of read-only volume |
US7430571B2 (en) * | 2004-04-30 | 2008-09-30 | Network Appliance, Inc. | Extension of write anywhere file layout write allocation |
US7519628B1 (en) | 2004-06-01 | 2009-04-14 | Network Appliance, Inc. | Technique for accelerating log replay with partial cache flush |
US7509329B1 (en) | 2004-06-01 | 2009-03-24 | Network Appliance, Inc. | Technique for accelerating file deletion by preloading indirect blocks |
US8726129B1 (en) * | 2004-07-23 | 2014-05-13 | Hewlett-Packard Development Company, L.P. | Methods of writing and recovering erasure coded data |
US20060075281A1 (en) * | 2004-09-27 | 2006-04-06 | Kimmel Jeffrey S | Use of application-level context information to detect corrupted data in a storage system |
US7243207B1 (en) | 2004-09-27 | 2007-07-10 | Network Appliance, Inc. | Technique for translating a pure virtual file system data stream into a hybrid virtual volume |
US7194595B1 (en) | 2004-09-27 | 2007-03-20 | Network Appliance, Inc. | Technique for translating a hybrid virtual volume file system into a pure virtual file system data stream |
US7260678B1 (en) | 2004-10-13 | 2007-08-21 | Network Appliance, Inc. | System and method for determining disk ownership model |
US7603532B2 (en) | 2004-10-15 | 2009-10-13 | Netapp, Inc. | System and method for reclaiming unused space from a thinly provisioned data container |
US7730277B1 (en) | 2004-10-25 | 2010-06-01 | Netapp, Inc. | System and method for using pvbn placeholders in a flexible volume of a storage system |
US7636744B1 (en) | 2004-11-17 | 2009-12-22 | Netapp, Inc. | System and method for flexible space reservations in a file system supporting persistent consistency point images |
US7523286B2 (en) * | 2004-11-19 | 2009-04-21 | Network Appliance, Inc. | System and method for real-time balancing of user workload across multiple storage systems with shared back end storage |
US7707165B1 (en) | 2004-12-09 | 2010-04-27 | Netapp, Inc. | System and method for managing data versions in a file system |
US7506111B1 (en) | 2004-12-20 | 2009-03-17 | Network Appliance, Inc. | System and method for determining a number of overwitten blocks between data containers |
US8019842B1 (en) | 2005-01-27 | 2011-09-13 | Netapp, Inc. | System and method for distributing enclosure services data to coordinate shared storage |
US8180855B2 (en) * | 2005-01-27 | 2012-05-15 | Netapp, Inc. | Coordinated shared storage architecture |
US7424497B1 (en) | 2005-01-27 | 2008-09-09 | Network Appliance, Inc. | Technique for accelerating the creation of a point in time prepresentation of a virtual file system |
US7398460B1 (en) | 2005-01-31 | 2008-07-08 | Network Appliance, Inc. | Technique for efficiently organizing and distributing parity blocks among storage devices of a storage array |
US7574464B2 (en) * | 2005-02-14 | 2009-08-11 | Netapp, Inc. | System and method for enabling a storage system to support multiple volume formats simultaneously |
US7757056B1 (en) | 2005-03-16 | 2010-07-13 | Netapp, Inc. | System and method for efficiently calculating storage required to split a clone volume |
US8055702B2 (en) * | 2005-04-25 | 2011-11-08 | Netapp, Inc. | System and method for caching network file systems |
US7689609B2 (en) * | 2005-04-25 | 2010-03-30 | Netapp, Inc. | Architecture for supporting sparse volumes |
US7617370B2 (en) | 2005-04-29 | 2009-11-10 | Netapp, Inc. | Data allocation within a storage system architecture |
US7468117B2 (en) * | 2005-04-29 | 2008-12-23 | Kimberly-Clark Worldwide, Inc. | Method of transferring a wet tissue web to a three-dimensional fabric |
US7370261B2 (en) * | 2005-05-09 | 2008-05-06 | International Business Machines Corporation | Convolution-encoded raid with trellis-decode-rebuild |
US7401253B2 (en) * | 2005-05-09 | 2008-07-15 | International Business Machines Corporation | Convolution-encoded data storage on a redundant array of independent devices |
US7634760B1 (en) | 2005-05-23 | 2009-12-15 | Netapp, Inc. | System and method for remote execution of a debugging utility using a remote management module |
US7739318B2 (en) | 2005-06-20 | 2010-06-15 | Netapp, Inc. | System and method for maintaining mappings from data containers to their parent directories |
US7516285B1 (en) | 2005-07-22 | 2009-04-07 | Network Appliance, Inc. | Server side API for fencing cluster hosts via export access rights |
US7653682B2 (en) * | 2005-07-22 | 2010-01-26 | Netapp, Inc. | Client failure fencing mechanism for fencing network file system data in a host-cluster environment |
US7650366B1 (en) | 2005-09-09 | 2010-01-19 | Netapp, Inc. | System and method for generating a crash consistent persistent consistency point image set |
US20070088917A1 (en) * | 2005-10-14 | 2007-04-19 | Ranaweera Samantha L | System and method for creating and maintaining a logical serial attached SCSI communication channel among a plurality of storage systems |
US7467276B1 (en) | 2005-10-25 | 2008-12-16 | Network Appliance, Inc. | System and method for automatic root volume creation |
US7376796B2 (en) | 2005-11-01 | 2008-05-20 | Network Appliance, Inc. | Lightweight coherency control protocol for clustered storage system |
US7653829B2 (en) * | 2005-12-08 | 2010-01-26 | Electronics And Telecommunications Research Institute | Method of data placement and control in block-divided distributed parity disk array |
US7693864B1 (en) | 2006-01-03 | 2010-04-06 | Netapp, Inc. | System and method for quickly determining changed metadata using persistent consistency point image differencing |
US7734603B1 (en) | 2006-01-26 | 2010-06-08 | Netapp, Inc. | Content addressable storage array element |
US8560503B1 (en) | 2006-01-26 | 2013-10-15 | Netapp, Inc. | Content addressable storage system |
US8285817B1 (en) | 2006-03-20 | 2012-10-09 | Netapp, Inc. | Migration engine for use in a logical namespace of a storage system environment |
US7590660B1 (en) | 2006-03-21 | 2009-09-15 | Network Appliance, Inc. | Method and system for efficient database cloning |
US8260831B2 (en) * | 2006-03-31 | 2012-09-04 | Netapp, Inc. | System and method for implementing a flexible storage manager with threshold control |
US20070233868A1 (en) * | 2006-03-31 | 2007-10-04 | Tyrrell John C | System and method for intelligent provisioning of storage across a plurality of storage systems |
US7769723B2 (en) * | 2006-04-28 | 2010-08-03 | Netapp, Inc. | System and method for providing continuous data protection |
US8051043B2 (en) | 2006-05-05 | 2011-11-01 | Hybir Inc. | Group based complete and incremental computer file backup system, process and apparatus |
US7822921B2 (en) | 2006-10-31 | 2010-10-26 | Netapp, Inc. | System and method for optimizing write operations in storage systems |
US7613947B1 (en) | 2006-11-30 | 2009-11-03 | Netapp, Inc. | System and method for storage takeover |
WO2008070814A2 (en) * | 2006-12-06 | 2008-06-12 | Fusion Multisystems, Inc. (Dba Fusion-Io) | Apparatus, system, and method for a scalable, composite, reconfigurable backplane |
US7647526B1 (en) | 2006-12-06 | 2010-01-12 | Netapp, Inc. | Reducing reconstruct input/output operations in storage systems |
US8301673B2 (en) * | 2006-12-29 | 2012-10-30 | Netapp, Inc. | System and method for performing distributed consistency verification of a clustered file system |
US8219821B2 (en) | 2007-03-27 | 2012-07-10 | Netapp, Inc. | System and method for signature based data container recognition |
US8312214B1 (en) | 2007-03-28 | 2012-11-13 | Netapp, Inc. | System and method for pausing disk drives in an aggregate |
US8209587B1 (en) | 2007-04-12 | 2012-06-26 | Netapp, Inc. | System and method for eliminating zeroing of disk drives in RAID arrays |
US8898536B2 (en) * | 2007-04-27 | 2014-11-25 | Netapp, Inc. | Multi-core engine for detecting bit errors |
US8219749B2 (en) * | 2007-04-27 | 2012-07-10 | Netapp, Inc. | System and method for efficient updates of sequential block storage |
US7882304B2 (en) * | 2007-04-27 | 2011-02-01 | Netapp, Inc. | System and method for efficient updates of sequential block storage |
US7840837B2 (en) * | 2007-04-27 | 2010-11-23 | Netapp, Inc. | System and method for protecting memory during system initialization |
US7827350B1 (en) | 2007-04-27 | 2010-11-02 | Netapp, Inc. | Method and system for promoting a snapshot in a distributed file system |
US7752489B2 (en) | 2007-05-10 | 2010-07-06 | International Business Machines Corporation | Data integrity validation in storage systems |
US7836331B1 (en) | 2007-05-15 | 2010-11-16 | Netapp, Inc. | System and method for protecting the contents of memory during error conditions |
US7975102B1 (en) | 2007-08-06 | 2011-07-05 | Netapp, Inc. | Technique to avoid cascaded hot spotting |
US7873878B2 (en) * | 2007-09-24 | 2011-01-18 | International Business Machines Corporation | Data integrity validation in storage systems |
US7873803B2 (en) | 2007-09-25 | 2011-01-18 | Sandisk Corporation | Nonvolatile memory with self recovery |
US7996636B1 (en) | 2007-11-06 | 2011-08-09 | Netapp, Inc. | Uniquely identifying block context signatures in a storage volume hierarchy |
US7984259B1 (en) | 2007-12-17 | 2011-07-19 | Netapp, Inc. | Reducing load imbalance in a storage system |
US8380674B1 (en) | 2008-01-09 | 2013-02-19 | Netapp, Inc. | System and method for migrating lun data between data containers |
US8621154B1 (en) | 2008-04-18 | 2013-12-31 | Netapp, Inc. | Flow based reply cache |
US8725986B1 (en) | 2008-04-18 | 2014-05-13 | Netapp, Inc. | System and method for volume block number to disk block number mapping |
US8161236B1 (en) | 2008-04-23 | 2012-04-17 | Netapp, Inc. | Persistent reply cache integrated with file system |
US8006128B2 (en) * | 2008-07-31 | 2011-08-23 | Datadirect Networks, Inc. | Prioritized rebuilding of a storage device |
WO2010049928A1 (en) * | 2008-10-27 | 2010-05-06 | Kaminario Tehnologies Ltd. | System and methods for raid writing and asynchronous parity computation |
US9158579B1 (en) | 2008-11-10 | 2015-10-13 | Netapp, Inc. | System having operation queues corresponding to operation execution time |
US8135980B2 (en) | 2008-12-23 | 2012-03-13 | Unisys Corporation | Storage availability using cryptographic splitting |
WO2010057186A1 (en) * | 2008-11-17 | 2010-05-20 | Unisys Corporation | Data recovery using error strip identifiers |
US8386798B2 (en) | 2008-12-23 | 2013-02-26 | Unisys Corporation | Block-level data storage using an outstanding write list |
US8392682B2 (en) | 2008-12-17 | 2013-03-05 | Unisys Corporation | Storage security using cryptographic splitting |
US8495417B2 (en) * | 2009-01-09 | 2013-07-23 | Netapp, Inc. | System and method for redundancy-protected aggregates |
US8171227B1 (en) | 2009-03-11 | 2012-05-01 | Netapp, Inc. | System and method for managing a flow based reply cache |
US8433685B2 (en) * | 2010-08-18 | 2013-04-30 | Hewlett-Packard Development Company, L.P. | Method and system for parity-page distribution among nodes of a multi-node data-storage system |
US8849877B2 (en) | 2010-08-31 | 2014-09-30 | Datadirect Networks, Inc. | Object file system |
US8572441B2 (en) * | 2011-08-05 | 2013-10-29 | Oracle International Corporation | Maximizing encodings of version control bits for memory corruption detection |
US8756582B2 (en) * | 2011-08-22 | 2014-06-17 | International Business Machines Corporation | Tracking a programs calling context using a hybrid code signature |
US20130198585A1 (en) * | 2012-02-01 | 2013-08-01 | Xyratex Technology Limited | Method of, and apparatus for, improved data integrity |
US8874956B2 (en) | 2012-09-18 | 2014-10-28 | Datadirect Networks, Inc. | Data re-protection in a distributed replicated data storage system |
US9043559B2 (en) | 2012-10-23 | 2015-05-26 | Oracle International Corporation | Block memory engine with memory corruption detection |
US9367394B2 (en) | 2012-12-07 | 2016-06-14 | Netapp, Inc. | Decoupled reliability groups |
US8843447B2 (en) | 2012-12-14 | 2014-09-23 | Datadirect Networks, Inc. | Resilient distributed replicated data storage system |
US9020893B2 (en) | 2013-03-01 | 2015-04-28 | Datadirect Networks, Inc. | Asynchronous namespace maintenance |
US10482009B1 (en) * | 2013-03-15 | 2019-11-19 | Google Llc | Use of a logical-to-logical translation map and a logical-to-physical translation map to access a data storage device |
US9619499B2 (en) | 2013-08-07 | 2017-04-11 | International Business Machines Corporation | Hardware implementation of a tournament tree sort algorithm |
US9830354B2 (en) | 2013-08-07 | 2017-11-28 | International Business Machines Corporation | Accelerating multiple query processing operations |
US9672298B2 (en) | 2014-05-01 | 2017-06-06 | Oracle International Corporation | Precise excecution of versioned store instructions |
US9563509B2 (en) | 2014-07-15 | 2017-02-07 | Nimble Storage, Inc. | Methods and systems for storing data in a redundant manner on a plurality of storage units of a storage system |
US9195593B1 (en) | 2014-09-27 | 2015-11-24 | Oracle International Corporation | Hardware assisted object memory migration |
US10310813B2 (en) | 2014-12-29 | 2019-06-04 | International Business Machines Corporation | Hardware implementation of a tournament tree sort algorithm using an external memory |
CN107748702B (en) | 2015-06-04 | 2021-05-04 | 华为技术有限公司 | Data recovery method and device |
US10528546B1 (en) | 2015-09-11 | 2020-01-07 | Cohesity, Inc. | File system consistency in a distributed system using version vectors |
US11016848B2 (en) | 2017-11-02 | 2021-05-25 | Seagate Technology Llc | Distributed data storage system with initialization-less parity |
US11593237B2 (en) | 2021-05-28 | 2023-02-28 | International Business Machines Corporation | Fast recovery with enhanced raid protection |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3387261A (en) * | 1965-02-05 | 1968-06-04 | Honeywell Inc | Circuit arrangement for detection and correction of errors occurring in the transmission of digital data |
US4092732A (en) * | 1977-05-31 | 1978-05-30 | International Business Machines Corporation | System for recovering data stored in failed memory unit |
NL7804674A (en) * | 1978-05-02 | 1979-11-06 | Philips Nv | MEMORY WITH ERROR DETECTION AND CORRECTION. |
US4433388A (en) * | 1980-10-06 | 1984-02-21 | Ncr Corporation | Longitudinal parity |
DE3379192D1 (en) * | 1983-12-19 | 1989-03-16 | Itt Ind Gmbh Deutsche | Correction method for symbol errors in video/teletext signals |
US4842262A (en) * | 1984-02-22 | 1989-06-27 | Delphax Systems | Document inverter |
-
1986
- 1986-06-12 US US06873249 patent/US4761785B1/en not_active Expired - Lifetime
-
1987
- 1987-04-27 CA CA000535598A patent/CA1270333A/en not_active Expired - Lifetime
- 1987-05-08 JP JP62110902A patent/JPS62293355A/en active Granted
- 1987-05-26 DE DE3750790T patent/DE3750790T2/en not_active Expired - Lifetime
- 1987-05-26 EP EP87107666A patent/EP0249091B1/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
JPS62293355A (en) | 1987-12-19 |
US4761785B1 (en) | 1996-03-12 |
DE3750790D1 (en) | 1995-01-12 |
US4761785A (en) | 1988-08-02 |
EP0249091B1 (en) | 1994-11-30 |
JPH0547857B2 (en) | 1993-07-19 |
DE3750790T2 (en) | 1995-05-24 |
EP0249091A2 (en) | 1987-12-16 |
EP0249091A3 (en) | 1990-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA1270333A (en) | Parity spreading to enhance storge access | |
US5881311A (en) | Data storage subsystem with block based data management | |
US10210045B1 (en) | Reducing concurrency bottlenecks while rebuilding a failed drive in a data storage system | |
US9696914B2 (en) | System and method for transposed storage in RAID arrays | |
US5809516A (en) | Allocation method of physical regions of a disc array to a plurality of logically-sequential data, adapted for increased parallel access to data | |
US5404361A (en) | Method and apparatus for ensuring data integrity in a dynamically mapped data storage subsystem | |
CA1321845C (en) | File system for a plurality of storage classes | |
US5574882A (en) | System and method for identifying inconsistent parity in an array of storage | |
US5708668A (en) | Method and apparatus for operating an array of storage devices | |
EP0485110B1 (en) | Logical partitioning of a redundant array storage system | |
US6138125A (en) | Block coding method and system for failure recovery in disk arrays | |
US7155569B2 (en) | Method for raid striped I/O request generation using a shared scatter gather list | |
US4972316A (en) | Method of handling disk sector errors in DASD cache | |
US5632012A (en) | Disk scrubbing system | |
US5564116A (en) | Array type storage unit system | |
US6070254A (en) | Advanced method for checking the integrity of node-based file systems | |
US7111227B2 (en) | Methods and systems of using result buffers in parity operations | |
JP2000511318A (en) | Transformable RAID for Hierarchical Storage Management System | |
EP0572564A4 (en) | ||
JPH04230512A (en) | Method and apparatus for updating record for dasd array | |
JPH04278641A (en) | Data memory system and method | |
US6427212B1 (en) | Data fault tolerance software apparatus and method | |
US6766480B2 (en) | Using task description blocks to maintain information regarding operations | |
US7346733B2 (en) | Storage apparatus, system and method using a plurality of object-based storage devices | |
Varma et al. | Destage algorithms for disk arrays with non-volatile caches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MKEX | Expiry |