WO2016095233A1

WO2016095233A1 - Method and apparatus for realizing non-volatile cache

Info

Publication number: WO2016095233A1
Application number: PCT/CN2014/094448
Authority: WO
Inventors: 刘建伟; 丁杰; 刘乐乐; 周文
Original assignee: 北京麓柏科技有限公司
Priority date: 2014-12-19
Filing date: 2014-12-19
Publication date: 2016-06-23

Abstract

Disclosed are a method and apparatus for realizing a non-volatile cache. The method for realizing a non-volatile cache being: firstly virtualizing physical flash memory storage resources as a flash memory storage pool, and then creating three kinds of logical storage units, i.e., a big cache unit, a small cache unit and a mirror image writing unit, on the storage pool, wherein the big cache unit is used for providing a conventional caching service, the small cache unit is used for providing an acceleration service for a random write operation and a temporary data storage service for a read operation, and the mirror image writing unit is used for providing a redundant backup protection function for dirty data in a big cache and a small cache. The method for realizing a non-volatile cache of the present invention avoids the problem of creating a huge cache state table, and also prevents a redundant backup method from seriously affecting the writing performance, and can achieve an ultra-large capacity and ultra-high performance, thereby significantly improving the read and write performance of a centralized control device.

Description

Method and device for implementing non-volatile cache

Technical field

The present invention relates to the field of storage technologies, and in particular, to a non-volatile cache implementation method and apparatus for improving storage performance of a centralized control device and an entire storage system in a centralized control device for a centralized distributed storage architecture.

Background technique

With the development of semiconductor technology, high-speed non-volatile memory devices (such as flash memory) have higher storage densities, and have been widely used as data access acceleration devices in data centers. A non-volatile memory device (such as a flash memory) has a faster random access speed than a mechanical disk; a non-volatile memory device (such as a flash memory) can continue to hold data after the power is turned off, and has Greater storage density.

The high storage density, non-volatility, and high access speed of flash memory make flash memory widely used in storage systems. One of the applications is as an acceleration device for storage systems. As an acceleration device for storage systems, flash memory can also be implemented in various forms. There are flash accelerator cards, flash acceleration storage tiers, and flash acceleration caches.

When flash memory is used as an accelerated cache application, it needs to record the information of each cache line (Cache Line) of the cache. For example, the address of the cached object, the state of the cache (dirty, invalid, frozen, cleared, loaded, etc.), and the degree of aging of the cached line. The number of cache lines is related to the size of the flash cache and the granularity of the IO request.

When flash memory is used as an accelerated cache application, it is also necessary to ensure data consistency between the flash cache and the back-end connected storage system (for example, distributed storage cluster 203), that is, the data in the flash cache is connected to the storage system connected to the back end. The data is consistent.

Among the existing flash cache methods, there is only used as a read cache. It is used as a read-write cache, but the acceleration effect on write operations is limited. In order to ensure data reliability, a redundant backup technology that greatly affects write performance is used. On the other hand, the existing flash cache implementation, the capacity of the flash memory and Did not reach the hundred TB level.

The above disclosure of the present invention is only for assisting in understanding the inventive concept and technical solution of the present invention, and it does not necessarily belong to the prior art of the present patent application, and there is no clear evidence that the above content has been disclosed on the filing date of the present patent application. In the event that the above background art should not be used to evaluate the novelty and inventiveness of the present application.

Summary of the invention

The object of the present invention is to provide a non-volatile cache implementation method, which solves the problem that the cache state table management and the data consistency problem caused by the cache cache entry and the data consistency problem existing in the above-mentioned prior art flash cache are large and the control device reads. Write technical problems with poor performance.

To this end, the present invention provides a non-volatile cache implementation method, first virtualizing a physical flash storage resource into a flash storage pool, and then creating three logical storage units, a large cache unit, and a small cache unit on the storage pool. And a write mirror unit for providing a conventional cache service, the small cache unit for providing an acceleration service for a random write operation and a data temporary storage service for a read operation, the write mirror unit being used for Dirty data in the cache unit and the small cache unit provides redundant backup protection;

When data is written, if the write operation hits the cache line of the small cache unit, the data is written to the small cache unit. If the cache unit is missed but the cache line of the large cache unit is hit, the data is written to the cache. Units, such as large cache units and small cache units are missed and the acceleration identifier is valid, then the data is written to the small cache unit, otherwise the data is not written to the flash storage resource and directly written to the backend storage cluster;

When the data is read, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned. If the cache unit is missed but the cache line of the large cache unit is hit, the large cache unit is The data is returned. If the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the cache line of the large cache unit, and then The data is returned to the front-end data requesting unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary storage identifier is valid, the cache line data corresponding to the small cache unit is read from the backend storage cluster and loaded into the small The cache line of the cache unit returns the data to the front-end data request unit. Otherwise, the data read from the back-end storage cluster is directly sent to the front-end data request unit without passing through the flash storage resource.

Preferably, the method of the present invention may further have the following technical features:

The size of the large cache unit, the small cache unit, and the write mirror unit satisfy the following formula (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size, where Big_Size is the size of a large cache unit, and Little_Size is a small cache unit. Size, Mirror_size is the size of the write mirror unit, Little_granularity is the size of the small cache unit cache line, Big_granularity is the size of the large cache unit cache line, available_DRAM_Size is the size of the DRAM of the available storage cache status table, entry_size is the cache of each table The size of the item.

The write mirror unit is composed of at least one logical write mirror subunit, and the large cache unit and the small cache unit are respectively composed of at least one logical large cache subunit and at least one logical small cache subunit.

The physical flash storage resource includes more than two physical trays, and the large cache unit, the small cache unit, and the write mirror unit all span the two or more physical trays.

When the data is written into the large cache unit, the small cache unit, and the write mirror unit, the physical location of the write of the large cache unit is different from the physical location of the write of the write mirror unit. The physical location of the write of the small cache unit and the physical location of the write of the write mirror unit are also on the physical tray.

The single cache line of the small cache unit and the write mirror unit are located in the same physical tray or span more than two physical trays, and a single cache line of the large cache unit is located in the same physical tray or spans more than two physical tray.

The physical tray to which the data write operation and the data read operation fall is based on the following principle: When a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, and the original map is mapped to Read and write operations on other physical trays maintain the mapping relationship unchanged.

The cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, the dirty state indicating that data in the cache line is inconsistent with data in the backend storage system, the clean state indicating data in the cache line and The data in the backend storage system is consistent. The invalid state indicates that there is no valid data in the cache line; when the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps when receiving the clean data load request. Go to a clean state; when the cache is dirty, only the cache line clear request is received When the cache line is in a clean state, it jumps to the dirty state when it receives the data write request, and jumps to the invalid state when it receives the invalidation request.

The cache line of the small cache unit includes at least a dirty state, a clean state, an invalid state, and a frozen state, where the dirty state indicates that data in the cache line is inconsistent with data in the backend storage system, and the clean state indicates that the cache line is in the cache line. The data is consistent with the data in the backend storage system. The invalid state indicates that there is no valid data in the cache line. The frozen state indicates that the current cache line is in a frozen state and can only be read and cannot be written; when the cache line In the invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request; when the cache is in the dirty state, it jumps to the invalid state when the cache line clear request is received. When the move request is received, it jumps to the frozen state; when the cache line is in the clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when the read request is received; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.

A guard unit is further configured to clear dirty data in the write mirror unit to the backend storage cluster in the background to limit dirty data in the flash storage resource that needs to be redundantly backed up within a predetermined range.

The redundant backup adopts a write mirroring mode.

The physical flash storage resource is a flash memory, a phase memory.

The present invention also provides a non-volatile cache implementation apparatus, including: a flash storage resource virtualization unit for virtualizing a physical flash storage resource into a flash storage pool;

a logical storage unit creating unit, configured to create three logical storage units, a large cache unit, a small cache unit and a write mirror unit on the storage pool, wherein the large cache unit is configured to provide a conventional cache service, and the small cache unit a data temporary storage service for providing an accelerated service and a read operation for a random write operation, the write mirroring unit for providing redundant backup protection function for dirty data in a large cache and a small cache;

a data writing unit and a data reading unit;

When the data writing unit performs data writing, if the write operation hits the cache line of the small cache unit, the data is written into the small cache unit, such as the cache unit that misses the small cache unit but hits the large cache unit. Then the data is written into the large cache unit, such as the large cache unit and the small cache unit are missed and the acceleration identifier is valid, then the data is written into the small cache unit, otherwise the data is not written into the flash storage resource and directly written to the back end. Storage cluster

When the data reading unit performs data reading, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned, such as missing the small cache unit but hitting the cache line of the large cache unit. , the data in the large cache unit is returned, if the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the large cache. The cache line of the unit returns the data to the front-end data request unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary identifier is valid, the corresponding small cache unit is read from the back-end storage cluster. Cache the row data and load it into the cache line of the small cache unit, and then return the data to the front-end data request unit. Otherwise, the data read from the back-end storage cluster is directly sent to the front-end front-end data request unit without passing through the flash storage resource.

Preferably, the device of the present invention may also have the following technical features:

The write mirror unit may be composed of a plurality of logical write mirror subunits.

The physical flash storage resource includes more than two physical trays, and the large cache unit, the small cache unit, and the write mirror unit may span the two or more physical trays.

The data writing unit writes data to the large cache unit, the small cache unit, and the write mirror unit, the write physical location of the large cache unit is different from the write physical location of the write mirror unit On the physical tray, the physical location of the write of the small cache unit and the physical location of the write of the write mirror unit are also on different physical trays.

The single cache line of the small cache unit and the write mirror unit are located in the same physical tray or span more than two physical trays, and a single cache line of the large cache unit is located in the same physical tray or across More than two physical trays.

Which physical tray the operation of the data writing unit and the data reading unit falls to is based on the following principle: when a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, and The read and write operations mapped to other physical trays remain unchanged.

The cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, the dirty state indicating that data in the cache line is inconsistent with data in the backend storage system, the clean state indicating data in the cache line and The data in the backend storage system is consistent. The invalid state indicates that there is no valid data in the cache line; when the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps when receiving the clean data load request. Go to the clean state; when the cache is dirty, it will jump to the clean state only when the cache line clear request is received; when the cache line is in the clean state, it will jump to the dirty state when receiving the data write request, and the invalidation is received. Jumps to an invalid state on request.

The redundant backup adopts a write mirroring mode.

Advantages of the present invention in comparison with the prior art include: by virtualizing a physical flash storage resource into a flash storage pool, and creating three logical storage units on the storage pool, and the data writing and reading methods employed The non-volatile cache implementation method of the present invention avoids generating a huge cache state table The problem also avoids redundant backup methods that seriously affect write performance, enabling ultra-large capacity and ultra-high performance, which significantly improves the read and write performance of centralized control devices, and provides uninterrupted storage services.

DRAWINGS

1 is a schematic diagram showing the overall logical structure of a flash cache of Embodiment 1;

2 is a schematic diagram of a centralized distributed storage architecture of Embodiment 1;

3 is a schematic diagram showing the overall physical structure of a flash cache in Embodiment 1;

4 is a cache line simplified state conversion table of a large cache unit in Embodiment 1;

5 is a cache line simplified state conversion table of a small cache unit in Embodiment 1;

6 is a flowchart of a flash buffer write operation in Embodiment 1;

7 is a flow chart of a flash cache read operation in Embodiment 1;

8 is another flow chart of the flash cache read operation in Embodiment 1;

FIG. 9 is a diagram showing an example of correspondence between a flash buffer logic module and a physical module in the first embodiment.

detailed description

The non-volatile memory device (ie, flash memory storage resource) in the cache implementation method disclosed by the present invention includes, but is not limited to, a flash memory, a phase memory, and the like. The storage system connected to the back end of the present invention includes, but is not limited to, the centralized distributed storage system (cluster) given in 203 of FIG. 2, and the following is merely an example of a centralized distributed storage system architecture.

In the centralized distributed storage system architecture shown in Figure 2, the flash cache in the centralized control device needs to have the characteristics of large capacity and ultra high performance (providing high IOPS and low latency) because of the centralized control device connection distribution. The storage capacity of the storage cluster is PB level, and the corresponding cache capacity is hundreds of terabytes. However, the large-capacity flash cache is in two challenges, namely cache line entry management issues and data consistency issues.

When the flash memory is used as a cache, the entire storage resource needs to be decomposed into a plurality of cache lines according to a certain granularity. For each cache line, information about the cache line needs to be recorded, such as where the data stored in the cache line comes from. , the current state of the cache line, etc., when the capacity of the flash cache reaches hundreds of terabytes, for example, 200T Byte, if the cache line is divided according to the granularity of 4K Byte, there are a total of 200TB/4KB=50×10 ⁹ cache lines. Assuming that each cache line requires 16 Bytes to record its state, then a total of 800 GByte tables are needed to identify the state of the entire flash cache, which is a huge and unacceptable table. The granularity of 4KByte is determined by the virtual machine 201, that is, as the block storage device of the virtual machine 201, the block access unit for storing data is 4KByte. This will result in a huge cache state table, that is, cache row entry management issues.

When the flash memory is used as a cache, it is also required to ensure the consistency between the data in the cache and the data of the back-end distributed storage cluster 203. When the data in the cache and the data in the distributed storage cluster 203 are inconsistent, the data in the cache needs to be backed up. . The most widely used protection method is RAID5/6, but RAID5/6 comes at the expense of huge write performance. Another way is to use only as a read cache. Any write operation is directly written to the backend distributed storage cluster 203, and the related data in the flash cache is put into an invalid state, thereby ensuring that the data in the cache is always stored and backend. The cluster data is consistent and avoids backup protection of the data in the cache, but such an implementation can only speed up partial read operations and cannot speed up write operations. This is the issue of data consistency and its adverse effects.

The present invention will be further described in detail below in conjunction with the specific embodiments and with reference to the accompanying drawings. It is to be understood that the following description is only illustrative, and is not intended to limit the scope of the invention.

Non-limiting and non-exclusive embodiments will be described with reference to the following Figures 1-9, wherein like reference numerals refer to the like parts unless otherwise specified.

Embodiment 1:

A non-volatile cache implementation method first virtualizes a physical flash storage resource into a flash storage pool, and then creates three logical storage units on the storage pool, a large cache unit 101, a small cache unit 102, and a write mirror unit. 103, as shown in Figure 1. The large cache unit 101 is configured to provide a conventional cache service, the small cache unit 102 is configured to provide an acceleration service for a random write operation and a data temporary storage service for a read operation, and the write mirror unit 103 is configured to serve as a large cache unit. The dirty data in the 101 and the small cache unit 102 provides a redundant backup protection function; when the data is written, if the write operation hits the cache line of the small cache unit 102, the data is written to the small cache unit 102, such as a small miss. The cache unit 102 hits the cache line of the large cache unit 101, and writes the data to the large cache unit 101. If the large cache unit 101 and the small cache unit 102 are both missed and the acceleration identifier is valid, the data is written into the small cache unit. 102, otherwise the data is not written The flash storage resource is directly written into the backend storage cluster 203; when the data is read, if the read operation hits the cache line of the small cache unit 102, the data in the small cache unit 102 is returned, such as missing the small cache unit. If the cache line of the large cache unit 101 is hit, the data in the large cache unit 101 is returned. If the large cache unit 101 and the small cache unit 102 are both missed and the acceleration identifier is valid, the read from the backend storage cluster is large. The cache line of the cache unit 101 corresponds to the size data and is loaded into the cache line of the large cache unit 101, and then the data is returned to the virtual machine 201, such as the large cache unit 101 and the small cache unit 102 are missed and the acceleration identifier is invalid but the data is temporarily suspended. If the save flag is valid, the cache line data corresponding to the small cache unit 102 is read from the backend storage cluster and loaded into the cache line of the small cache unit 102, and then the data is returned to the virtual machine 201, otherwise read from the backend storage cluster. Data is sent directly to the frontend virtual machine 201 without going through the flash cache 100. The virtual machine 201 is used as one of the front end data application units. The front end data application unit in the present invention is not limited thereto.

In this embodiment, a schematic structural diagram of the physical flash storage resource (or the flash cache 100) is shown in FIG. 3, each tray provides physical flash storage resources, and internal technologies are used to ensure reliability inside the tray. And stability. The physical flash storage resource is divided into a large cache unit 101 and a small cache unit 102, which can effectively solve the problem that the super-large capacity flash cache status table is too large.

As shown in FIG. 4, for the cache line state table of the large cache unit, the state of the cache line of the large cache unit includes but is not limited to the state listed in FIG. The simplified cache state of the cache unit has three basic states: dirty state, that is, the data in the cache line is inconsistent with the data in the backend storage system 203; the clean state, that is, the data in the cache line and the backend storage system 203 The data is consistent; invalid state, that is, there is no valid data in the cache line. The state jump process is: when the cache line is in an invalid state, if a data write request of the cache line size is received (for example, a write request from the virtual machine 201), the cache line jumps to a dirty state, if received clean When a data load request (for example, a write request from the storage system 203), the cache line jumps to a clean state; when the cache is in a dirty state, the state jumps to a clean state only when a cache line clear request is received; when the cache line When in a clean state, if a data write request is received, the cache line jumps to a dirty state. If a invalidation request is received, the cache line jumps to an invalid state.

As shown in FIG. 5, for the cache line state table of the small cache unit, the state of the small cache unit cache line includes but is not limited to the state listed in FIG. 5. The basic state of the cache line of the small cache unit is simplified. The dirty state, that is, the data in the cache line is inconsistent with the data in the backend storage system 203; the clean state, that is, the data in the cache line is consistent with the data in the backend storage system 203; the invalid state, that is, there is no cache line Valid data; frozen state, that is, the current cache line is in a frozen state and can only be read and cannot be written. The state jump process is: when the cache line is in an invalid state, if a data write request is received (for example, a write request from the virtual machine 201), the cache line jumps to a dirty state, if a clean data load request is received (eg, a write request from the storage system 203), the cache line jumps to a clean state; when the cache is in a dirty state, if a cache line clear request is received, the state jump is in an invalid state, if a move request is received, the state The jump is in a frozen state; when the cache line is in a clean state, if a data write request is received, the cache line jumps to a dirty state, and if a read request is received, the cache line jumps to an invalid state; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.

The different states and jumps of the large/small cache unit enable accelerated read and write operations. In this example, the state of the large cache unit and the small cache unit and the jump are different because their service purposes are different. When the cache is accelerated for read/write access, the large cache unit or the small cache unit is used, depending on the policy prompt information and the large cache. Status information for units and small cache units. Policy prompt information includes, but is not limited to, service level, hit probability prediction, and the like. The policy prompt information may come directly from the centralized control device 202 or from the virtual machine 201. Status information includes, but is not limited to, whether it is a hit. In this example, the large cache unit is used to provide a conventional cache service, and different aging policies can be applied to different cache lines according to the service level; the small cache unit provides a cache acceleration function for the write operation that does not hit the large cache unit for the first time, and Provides data staging for read operations that do not hit large cache units.

The cache line of the small cache unit 102 is small, such as 4KByte; the cache line of the large cache unit 101 is large, such as 4 Mbytes; the cache line of the write mirror unit 103 and the cache line of the small cache unit 102 can be kept consistent. The size of the cache line can be adjusted according to the actual situation. For example, the cache line size of the small cache unit 102 is determined according to the storage request status of the virtual machine 201, and the cache line size of the large cache unit 101 is determined according to the implementation of the back-end distributed storage cluster 203. .

The size and relationship of the small cache unit 102, the large cache unit 101, and the write mirror unit 103 can be determined according to the DRAM resources in the centralized control device 202. For example, if all the records of the record cache state need to be placed in the corresponding DRAM resources of the centralized control device 202, then it is necessary to satisfy (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size. The smaller_Size is the size of the small cache unit 102, the Mirror_size is the size of the write mirror unit 103, and the Little_granularity is the size of the cache line of the small cache unit 102. In this embodiment, the cache line size of the small cache unit 102 and the data access of the virtual machine 201 are The block size is consistent, Big_Size is the size of the large cache unit 101, Big_granularity is the size of the cache line of the large cache unit 101, available_DRAM_Size is the size of the DRAM of the available storage cache status table, and entry_size is the size of each entry.

The write mirror unit 103 provides redundant backup protection for dirty data in the large cache unit 101 and the small cache unit 102. The data from the virtual machine 201 is also written into the write mirror unit 103 while being written to the large cache unit 101 or the small cache unit 102.

A preferred practice is to further include a guard unit that is responsible for clearing dirty data in the write mirror unit 103 to the backend storage cluster 203 in the background. Since the write mirror unit 103 only backs up the dirty data in the large cache unit 101 and the small cache unit 102, and the guard unit continuously clears the dirty data into the backend storage cluster 203 according to a predetermined rule, the dirty in the flash cache 100 The data is limited and there is no need to perform redundant backups of all data in the entire flash cache 100. At the same time, the backup strategy adopts the method of writing mirroring, which reduces the performance requirement of redundant backup on the one hand, and accelerates the acceleration of all write operations on the other hand.

As shown in FIG. 8, for the processing flow of the guard unit, first, the case of the write mirror unit 103 is detected. When the write mirror unit 103 is non-empty, the guard unit extracts a dirty data and related information (such as address information) from the write image. The flash cache status table is queried according to the related information to obtain the flash cache status. If the cache status indicates that the cache line of the small cache unit 102 is hit, and the cache line of the large cache unit 101 is not hit, the data in the cache line of the small cache unit 102 is directly cleared into the backend storage cluster 203. If the cache status indicates that the cache line of the hit cache unit 102 also hits the cache line of the large cache unit 101, the data in the cache line of the small cache unit 102 is first moved to the cache line of the large cache unit 101, and then The data in the cache line of the large cache unit 101 is cleared into the backend storage cluster 203. If the cache status indicates that the cache line of the small cache unit 102 is not hit, the cache line of the large cache unit 101 is hit, and The cache line of the large cache unit 101 contains dirty data, and the data in the cache line of the large cache unit 101 is cleared into the backend storage cluster 203. If the cache status indicates that the cache line of the small cache unit 102 is not hit, the cache line of the large cache unit 101 is hit, and there is no dirty data in the cache line of the large cache unit 101, no operation is required for the large/small cache unit. It should be noted that the description herein is only an example, and the process may also be modified according to the change of the state information. At the same time, the write mirror unit 103 can be composed of a plurality of logical write mirror subunits, each of which has its own daemon.

The corresponding relationship between the cache logical unit and the physical unit (physical tray) is shown in FIG. 9. Each logical unit, large cache unit 101, small cache unit 102, and write mirror unit 103 can span all physical trays. The advantage is to increase the concurrency of each physical tray and improve performance. The write mirror logic unit can be divided into a plurality of small write mirror logic subunits, for example, one logical write mirror subunit on each tray, and the advantage of splitting into multiple small logical write mirror subunits is that multiple writes can be concurrently performed. Mirror the daemon to speed up the removal of dirty data to the backend storage cluster.

As shown in FIG. 9, when the new write data from the virtual machine 201 is written to the large cache unit 101, the small cache unit 102, and the write mirror unit 103, the following principle can be applied: the physical of writing the large cache unit or the small cache unit. The physical location of the location and write write mirror unit 103 is not on the same physical tray. For example, the physical tray number written to the write mirror unit 103 may be a simple rule of writing the physical buffer number of the large cache unit 101 or the small cache unit 102 plus one (not limited thereto). This has the advantage of ensuring that the redundant backup and the original data are on different physical trays, and that the flash cache 100 still has available data available when a single physical tray is damaged. The size of the cache line of the large cache shown in Figure 9 is 4MByte, but in actual use, it can be adjusted according to the actual situation.

The single cache line of the small cache unit 102 and the write mirror unit 103 can be located in the same physical tray or across more than two physical trays, and a single cache line of the large cache unit can span multiple physical trays. It can also be located in the same physical tray. In this example, a single cache line of a large cache unit is located in the same physical tray as an example. This method is more convenient to realize the technical effect of providing uninterrupted service when a single physical tray is damaged.

In the case of a large cache unit, a small cache unit, and a write mirror unit split mode, in the case where one of the physical trays is damaged, the uninterrupted service provision is as shown in the following example.

If the tray 1 is damaged as shown in Fig. 9, the service cannot be provided, and the write image on the tray 1 backs up the dirty data on the tray 0. The data recovery and the uninterrupted service provision process are as follows:

Step 1: First mark tray 0 and tray 1 as not providing free cache line status.

The second step: traversal to remove dirty data, the thread is as follows:

Thread 1: Traverse the cache line status table of tray 0. If it is in a clean state, it will be invalidated. If it is in a dirty state, the data will be cleared to the backend storage cluster and then invalidated.

Thread 2: Traverse the cache line status table of the tray 1, and if it is in a clean state, it will be invalidated, and it will wait until the status becomes clean.

Thread 3: Increase the running priority of the write mirror daemon on the tray 2 to the highest level.

Threads

1, 2, and 3 are executed concurrently.

Step 3: Wait for the

tray

0 and 1 traversal to end, then set tray 0 to a state that can provide a free cache line. Because of the new situation, the tray 0 should be double-backed by the write mirror unit on the tray 2.

The algorithm for selecting which physical tray from which the virtual machine reads and writes is selected is determined according to the following principle: when a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, and the original The mapping relationship remains unchanged for read and write operations mapped to other physical trays. There are many algorithms that meet this requirement, such as the CRUSH algorithm.

In addition to solving the basic technical problem of the present invention, that is, the cache line entry management problem and the data consistency problem, the inventors have also found that the granularity of the read and write operations from the virtual machine 201 is the same as the cache line size of the small cache unit 102. However, the size of the cache line of the large cache unit 101 is large, so that a large/small cache unit is hit at the same time. It can be solved by the method described below:

As shown in FIGS. 2 and 6, when a write operation from the virtual machine 201 is sent to the flash cache 100, if the write operation hits the cache line of the small cache unit 102, the data is written to the small cache unit 102, if If the cache line of the small cache unit 102 is not hit, but the cache line of the large cache unit 101 is hit, the data is written to the large cache unit 101. If none of the large/small cache units are hit, the query acceleration flag is valid, if valid. The data is written to the small cache unit 102, otherwise the data is not written to the flash cache 100, and is directly written to the backend storage cluster 203 via the centralized control device 202. Such a write operation flow ensures that in the case where the write operation hits the cache line of the small cache unit 102, then the small cache unit 102 The data in it is always up to date.

As shown in FIGS. 2 and 7, when a read operation from the virtual machine 201 is sent to the flash cache 100, if the read operation hits the cache line of the small cache unit 102, the data in the small cache unit 102 is returned. If the cache line of the small cache unit 102 is not hit, but the cache line of the large cache unit 101 is hit, the data in the large cache unit 101 is returned. If the cache line of the large/small cache is not hit, the query acceleration flag is Valid, if valid, read the data of the large cache line size from the backend storage cluster 203, load it into the cache line of the large cache unit 101, and then return the data to the virtual machine 201. If invalid, the query data is temporarily stored. Whether the identification is valid, if valid, reads the cache line data of the small cache unit 102 from the backend storage cluster 203, loads it into the cache line of the small cache unit 102, and then returns the data to the virtual machine 201, otherwise it stores from the back end. The data read by the cluster is sent to the front-end virtual machine 201 directly through the centralized control device 202 without passing through the flash cache 100.

The non-volatile cache implementation method of this embodiment can control the size of the state table of the record buffer state within a certain range, and can accelerate all the write operations in addition to accelerating the read operation. In addition, only some of the data is backed up during backup, the amount of backup data is limited and the backup operation has little impact on performance. Furthermore, there is no hot spare disk to provide uninterrupted service.

Embodiment 2:

The apparatus of this embodiment is consistent with the non-volatile cache implementation method in the foregoing embodiment.

A non-volatile cache implementing apparatus includes a flash storage resource virtualization unit, a logical storage unit creation unit, a data writing unit, and a data reading unit.

The flash storage resource virtualization unit is configured to virtualize physical flash storage resources into a flash storage pool.

The logical storage unit creating unit is configured to create three logical storage units, a large cache unit, a small cache unit and a write mirror unit on the storage pool, the large cache unit is configured to provide a conventional cache service, the small cache The unit is configured to provide a data staging service for the acceleration service and the read operation of the random write operation, and the write mirror unit is configured to provide redundant backup protection function for the dirty data in the large cache and the small cache.

The physical flash storage resource preferably may include more than two physical trays, the large cache unit, the small cache unit, and the write mirror unit all spanning the two or more physical trays. And preferably: the small cache list A single cache line of meta and write mirror units are located in the same physical tray, and a single cache line of the large cache unit is located in the same physical tray or spans more than two physical trays.

When the data writing unit performs data writing, if the write operation hits the cache line of the small cache unit, the data is written into the small cache unit, such as the cache unit that misses the small cache unit but hits the large cache unit. Then, the data is written into the large cache unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data is written into the small cache unit, otherwise the data is not written into the flash storage resource and directly written to the backend storage cluster. .

When the data writing unit writes data to the large cache unit, the small cache unit, and the write mirror unit, the physical location of writing of the large cache unit is preferably different from the physical location of writing of the write mirror unit. On the physical tray, the physical location of the write of the small cache unit is preferably on a different physical tray than the write physical location of the write mirror unit.

When the data reading unit performs data reading, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned, such as the cache line that misses the small cache unit but hits the large cache unit. , the data in the large cache unit is returned, if the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the large cache. The cache line of the unit returns the data to the virtual machine. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary identifier is valid, the cache line corresponding to the small cache unit is read from the backend storage cluster. The data is loaded into the cache line of the small cache unit, and the data is returned to the virtual machine. Otherwise, the data read from the backend storage cluster is directly sent to the front-end virtual machine without passing through the flash storage resource.

The size division of the large cache unit, the small cache unit, and the write mirror unit may be performed in various manners, preferably using the following formula

(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size, where

Big_Size is the size of the large cache unit,

Little_Size is the size of a small cache unit.

Mirror_size is the size of the write mirror unit.

Little_granularity is the size of the cache line for small cache units.

Big_granularity is the size of the cache line for large cache units.

available_DRAM_Size is the size of the DRAM available for storing the cache status table,

Entry_size is the size of each entry cached.

Furthermore, the write mirror unit may be composed of a plurality of logical write mirror subunits.

Which physical tray the operation of the data writing unit and the data reading unit falls to is preferably based on the principle that when a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, The read and write operations mapped to other physical trays remain unchanged.

In this embodiment, it is also preferred to include a guard unit for clearing dirty data in the write mirror unit to the backend storage cluster in the background to limit dirty data in the flash storage resource that needs to be redundantly backed up. Within the predetermined range. The redundant backup preferably adopts a write mirroring manner.

Those skilled in the art will recognize that many variations are possible in the above description, so embodiments It is only used to describe one or more specific embodiments.

While the invention has been described and described with reference to the embodiments of the embodiments In addition, many modifications may be made to adapt a particular situation to the teachings of the invention, without departing from the inventive concept. Therefore, the invention is not limited to the specific embodiments disclosed herein, but the invention may also include all embodiments and equivalents thereof.

Claims

A non-volatile cache implementation method is characterized in that a physical flash storage resource is first virtualized into a flash storage pool, and then three logical storage units, a large cache unit, a small cache unit, and a write are created on the storage pool. a mirroring unit, the large cache unit is configured to provide a conventional cache service, the small cache unit is configured to provide an acceleration service for a random write operation and a data temporary storage service for a read operation, and the write mirror unit is configured to be a large cache unit And the dirty data in the small cache unit provides redundant backup protection;

When data is written, if the write operation hits the cache line of the small cache unit, the data is written to the small cache unit. If the cache unit is missed but the cache line of the large cache unit is hit, the data is written to the cache. Units, such as large cache units and small cache units are missed and the acceleration identifier is valid, then the data is written to the small cache unit, otherwise the data is not written to the flash storage resource and directly written to the backend storage cluster;

When the data is read, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned. If the cache unit is missed but the cache line of the large cache unit is hit, the large cache unit is The data is returned. If the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the cache line of the large cache unit, and then The data is returned to the front-end data requesting unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary storage identifier is valid, the cache line data corresponding to the small cache unit is read from the backend storage cluster and loaded into the small The cache line of the cache unit returns the data to the front-end data request unit. Otherwise, the data read from the back-end storage cluster is directly sent to the front-end data request unit without passing through the flash storage resource.
The non-volatile cache implementation method according to claim 1, wherein the size of the large cache unit, the small cache unit, and the write mirror unit satisfy the following formula

(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=

available_DRAM_Size/entry_size, where

Big_Size is the size of the large cache unit,

Little_Size is the size of a small cache unit.

Mirror_size is the size of the write mirror unit.

Little_granularity is the size of the cache line for small cache units.

Big_granularity is the size of the cache line for large cache units.

available_DRAM_Size is the size of the DRAM available for storing the cache status table,

Entry_size is the size of each entry cached.
The non-volatile cache implementation method according to claim 1, wherein the write mirror unit is composed of at least one logical write mirror sub-unit, and the large cache unit and the small cache unit are respectively configured by at least one logical large cache. The subunit is composed of at least one logical small buffer subunit.
The non-volatile cache implementation method according to claim 1, wherein the physical flash storage resource comprises more than two physical trays, and the large cache unit, the small cache unit and the write mirror unit are both spanned. More than two physical trays.
The non-volatile cache implementation method according to claim 4, wherein when the data is written into the large cache unit, the small cache unit, and the write mirror unit, the physical location of the large cache unit is written The write physical location of the write mirror unit is on the different physical tray, and the write physical location of the small cache unit and the write physical location of the write mirror unit are also on the physical tray.
The non-volatile cache implementation method according to claim 4, wherein the small cache unit and the single cache line of the write mirror unit are located in the same physical tray or span more than two physical trays, the large A single cache line of a cache unit is located in the same physical tray or spans more than two physical trays.
The non-volatile cache implementing method according to claim 4, wherein the physical writing tray and the data reading operation fall to which physical tray according to the following principle: when a physical tray is damaged, only the original mapping is performed to The operations on the physical tray are transferred to other physical trays, while the read and write operations originally mapped to other physical trays maintain the mapping relationship.
The non-volatile cache implementation method according to claim 1, wherein the cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, and the dirty state represents data in the cache line and The data in the end storage system is inconsistent, and the clean state indicates that the data in the cache line is consistent with the data in the backend storage system, and the invalid state indicates that there is no valid data in the cache line;

When the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request;

When the cache line is in a dirty state, it jumps to a clean state only when a cache line clear request is received;

When the cache line is in a clean state, it jumps to the dirty state when it receives the data write request, and jumps to the invalid state when it receives the invalidation request.
The non-volatile cache implementation method according to claim 1, wherein the cache line of the small cache unit comprises at least a dirty state, a clean state, an invalid state, and a frozen state, wherein the dirty state represents a cache line. The data is inconsistent with data in the backend storage system, the clean state indicating that the data in the cache line is consistent with the data in the backend storage system, the invalid state indicating that there is no valid data in the cache line, the frozen state indicating the current cache The line is in a frozen state and can only be read and cannot be written.

When the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request;

When the cache line is in a dirty state, the jump to the invalid state when the cache line clear request is received, and the jump to the frozen state when the move request is received;

When the cache line is in a clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when receiving the read request;

When the cache line is frozen, the cache line jumps to the invalid state only when the return is completed.
The non-volatile cache implementation method according to claim 1, further comprising a guard unit, configured to clear dirty data in the write mirror unit to the backend storage cluster in the background to limit the The dirty data that needs to be redundantly backed up in the flash storage resource is within a predetermined range.
The non-volatile cache implementation method according to claim 10, wherein the redundant backup adopts a write mirroring manner.
The non-volatile cache implementation method according to any one of claims 1 to 11, wherein the physical flash storage resource is a flash memory or a phase memory [H1].
A non-volatile cache implementation device, comprising:

a flash storage resource virtualization unit for virtualizing physical flash storage resources into a flash storage pool;

a logical storage unit creating unit, configured to create three logical storage units, a large cache unit, a small cache unit and a write mirror unit on the storage pool, wherein the large cache unit is configured to provide a conventional cache service, and the small cache unit a data temporary storage service for providing an accelerated service and a read operation for a random write operation, the write mirroring unit for providing redundant backup protection function for dirty data in a large cache and a small cache;

a data writing unit and a data reading unit;

When the data writing unit performs data writing, if the write operation hits the cache of the small cache unit Line, the data is written to the small cache unit. If the cache unit is missed but the cache line of the large cache unit is hit, the data is written to the large cache unit, such as the large cache unit and the small cache unit are missed and the identifier is accelerated. If it is valid, the data is written to the small cache unit, otherwise the data is not written into the flash storage resource and directly written to the backend storage cluster;

When the data reading unit performs data reading, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned, such as missing the small cache unit but hitting the cache line of the large cache unit. , the data in the large cache unit is returned, if the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the large cache. The cache line of the unit returns the data to the front-end data request unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary identifier is valid, the corresponding small cache unit is read from the back-end storage cluster. Cache the row data and load it into the cache line of the small cache unit, and then return the data to the front-end data request unit. Otherwise, the data read from the back-end storage cluster is directly sent to the front-end front-end data request unit without passing through the flash storage resource.
The non-volatile cache implementing apparatus according to claim 13, wherein the size of the large cache unit, the small cache unit, and the write mirror unit satisfy the following formula

(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=

available_DRAM_Size/entry_size, where

Big_Size is the size of the large cache unit,

Little_Size is the size of a small cache unit.

Mirror_size is the size of the write mirror unit.

Little_granularity is the size of the cache line for small cache units.

Big_granularity is the size of the cache line for large cache units.

available_DRAM_Size is the size of the DRAM available for storing the cache status table,

Entry_size is the size of each entry cached.
The non-volatile cache implementing apparatus according to claim 13, wherein the write mirror unit is composed of at least one logical write mirror sub-unit, and the large cache unit and the small cache unit may be one or more logically large. The cache subunit and the logical small cache subunit are composed.
The non-volatile cache implementing apparatus according to claim 13, wherein said physical flash storage resource comprises more than two physical trays, said large cache unit, small cache unit and write mirror The units all span the two or more physical trays.
The non-volatile cache implementing apparatus according to claim 16, wherein said data writing unit writes data to said large cache unit, said small cache unit and said write mirror unit, said large cache unit The write physical location is on a different physical tray than the write physical location of the write mirror unit, and the write physical location of the small cache unit and the write physical location of the write mirror unit are also in different physical trays on.
The non-volatile cache implementing apparatus according to claim 16, wherein the small cache unit and the single cache line of the write mirror unit are located in the same physical tray or span more than two physical trays, the large A single cache line of a cache unit is located in the same physical tray or spans more than two physical trays.
The non-volatile cache implementing apparatus according to claim 16, wherein the physical tray to which the operation of the data writing unit and the data reading unit falls is based on the following principle: when a physical tray is damaged, only The operations originally mapped to the physical tray are transferred to other physical trays, and the read and write operations originally mapped to other physical trays remain unchanged.
The non-volatile cache implementing apparatus according to claim 13, wherein the cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, and the dirty state represents data in the cache line and The data in the end storage system is inconsistent, and the clean state indicates that the data in the cache line is consistent with the data in the backend storage system, and the invalid state indicates that there is no valid data in the cache line;

When the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request;

When the cache line is in a dirty state, it jumps to a clean state only when a cache line clear request is received;

When the cache line is in a clean state, it jumps to the dirty state when it receives the data write request, and jumps to the invalid state when it receives the invalidation request.
The non-volatile cache implementing apparatus according to claim 13, wherein the cache line of the small cache unit comprises at least a dirty state, a clean state, an invalid state, and a frozen state, wherein the dirty state represents a cache line. The data is inconsistent with data in the backend storage system, the clean state indicating that the data in the cache line is consistent with the data in the backend storage system, the invalid state indicating that there is no valid data in the cache line, the frozen state indicating the current cache The line is in a frozen state and can only be read and cannot be written.

When the cache line is in an invalid state, it will jump to the dirty state when receiving the data write request, and receive a clean number. Jump to a clean state according to the load request;

When the cache line is in a dirty state, the jump to the invalid state when the cache line clear request is received, and the jump to the frozen state when the move request is received;

When the cache line is in a clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when receiving the read request;

When the cache line is frozen, the cache line jumps to the invalid state only when the return is completed.
The non-volatile cache implementing apparatus according to claim 13, further comprising: a guard unit, configured to clear dirty data in the write mirror unit to the backend storage cluster in the background to limit the The dirty data that needs to be redundantly backed up in the flash storage resource is within a predetermined range.
The non-volatile cache implementing apparatus according to claim 22, wherein said redundant backup adopts a write mirroring mode.
The non-volatile cache implementing apparatus according to any one of claims 13 to 23, wherein the physical flash storage resource is a flash memory or a phase memory.