WO2016095233A1 - Method and apparatus for realizing non-volatile cache - Google Patents

Method and apparatus for realizing non-volatile cache Download PDF

Info

Publication number
WO2016095233A1
WO2016095233A1 PCT/CN2014/094448 CN2014094448W WO2016095233A1 WO 2016095233 A1 WO2016095233 A1 WO 2016095233A1 CN 2014094448 W CN2014094448 W CN 2014094448W WO 2016095233 A1 WO2016095233 A1 WO 2016095233A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
unit
data
small
write
Prior art date
Application number
PCT/CN2014/094448
Other languages
French (fr)
Chinese (zh)
Inventor
刘建伟
丁杰
刘乐乐
周文
Original Assignee
北京麓柏科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京麓柏科技有限公司 filed Critical 北京麓柏科技有限公司
Priority to PCT/CN2014/094448 priority Critical patent/WO2016095233A1/en
Publication of WO2016095233A1 publication Critical patent/WO2016095233A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems

Definitions

  • the present invention relates to the field of storage technologies, and in particular, to a non-volatile cache implementation method and apparatus for improving storage performance of a centralized control device and an entire storage system in a centralized control device for a centralized distributed storage architecture.
  • non-volatile memory devices such as flash memory
  • a non-volatile memory device has a faster random access speed than a mechanical disk; a non-volatile memory device (such as a flash memory) can continue to hold data after the power is turned off, and has Greater storage density.
  • flash memory can also be implemented in various forms. There are flash accelerator cards, flash acceleration storage tiers, and flash acceleration caches.
  • cache Line When flash memory is used as an accelerated cache application, it needs to record the information of each cache line (Cache Line) of the cache. For example, the address of the cached object, the state of the cache (dirty, invalid, frozen, cleared, loaded, etc.), and the degree of aging of the cached line.
  • the number of cache lines is related to the size of the flash cache and the granularity of the IO request.
  • flash memory When flash memory is used as an accelerated cache application, it is also necessary to ensure data consistency between the flash cache and the back-end connected storage system (for example, distributed storage cluster 203), that is, the data in the flash cache is connected to the storage system connected to the back end. The data is consistent.
  • the back-end connected storage system for example, distributed storage cluster 203
  • the existing flash cache methods there is only used as a read cache. It is used as a read-write cache, but the acceleration effect on write operations is limited. In order to ensure data reliability, a redundant backup technology that greatly affects write performance is used. On the other hand, the existing flash cache implementation, the capacity of the flash memory and Did not reach the hundred TB level.
  • the object of the present invention is to provide a non-volatile cache implementation method, which solves the problem that the cache state table management and the data consistency problem caused by the cache cache entry and the data consistency problem existing in the above-mentioned prior art flash cache are large and the control device reads. Write technical problems with poor performance.
  • the present invention provides a non-volatile cache implementation method, first virtualizing a physical flash storage resource into a flash storage pool, and then creating three logical storage units, a large cache unit, and a small cache unit on the storage pool. And a write mirror unit for providing a conventional cache service, the small cache unit for providing an acceleration service for a random write operation and a data temporary storage service for a read operation, the write mirror unit being used for Dirty data in the cache unit and the small cache unit provides redundant backup protection;
  • the data in the small cache unit is returned. If the cache unit is missed but the cache line of the large cache unit is hit, the large cache unit is The data is returned. If the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the cache line of the large cache unit, and then The data is returned to the front-end data requesting unit.
  • the cache line data corresponding to the small cache unit is read from the backend storage cluster and loaded into the small The cache line of the cache unit returns the data to the front-end data request unit. Otherwise, the data read from the back-end storage cluster is directly sent to the front-end data request unit without passing through the flash storage resource.
  • the method of the present invention may further have the following technical features:
  • the write mirror unit is composed of at least one logical write mirror subunit, and the large cache unit and the small cache unit are respectively composed of at least one logical large cache subunit and at least one logical small cache subunit.
  • the physical flash storage resource includes more than two physical trays, and the large cache unit, the small cache unit, and the write mirror unit all span the two or more physical trays.
  • the physical location of the write of the large cache unit is different from the physical location of the write of the write mirror unit.
  • the physical location of the write of the small cache unit and the physical location of the write of the write mirror unit are also on the physical tray.
  • the single cache line of the small cache unit and the write mirror unit are located in the same physical tray or span more than two physical trays, and a single cache line of the large cache unit is located in the same physical tray or spans more than two physical tray.
  • the physical tray to which the data write operation and the data read operation fall is based on the following principle: When a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, and the original map is mapped to Read and write operations on other physical trays maintain the mapping relationship unchanged.
  • the cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, the dirty state indicating that data in the cache line is inconsistent with data in the backend storage system, the clean state indicating data in the cache line and The data in the backend storage system is consistent.
  • the invalid state indicates that there is no valid data in the cache line; when the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps when receiving the clean data load request. Go to a clean state; when the cache is dirty, only the cache line clear request is received When the cache line is in a clean state, it jumps to the dirty state when it receives the data write request, and jumps to the invalid state when it receives the invalidation request.
  • the cache line of the small cache unit includes at least a dirty state, a clean state, an invalid state, and a frozen state, where the dirty state indicates that data in the cache line is inconsistent with data in the backend storage system, and the clean state indicates that the cache line is in the cache line.
  • the data is consistent with the data in the backend storage system.
  • the invalid state indicates that there is no valid data in the cache line.
  • the frozen state indicates that the current cache line is in a frozen state and can only be read and cannot be written; when the cache line In the invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request; when the cache is in the dirty state, it jumps to the invalid state when the cache line clear request is received.
  • the move request When the move request is received, it jumps to the frozen state; when the cache line is in the clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when the read request is received; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.
  • a guard unit is further configured to clear dirty data in the write mirror unit to the backend storage cluster in the background to limit dirty data in the flash storage resource that needs to be redundantly backed up within a predetermined range.
  • the redundant backup adopts a write mirroring mode.
  • the physical flash storage resource is a flash memory, a phase memory.
  • the present invention also provides a non-volatile cache implementation apparatus, including: a flash storage resource virtualization unit for virtualizing a physical flash storage resource into a flash storage pool;
  • a logical storage unit creating unit configured to create three logical storage units, a large cache unit, a small cache unit and a write mirror unit on the storage pool, wherein the large cache unit is configured to provide a conventional cache service, and the small cache unit a data temporary storage service for providing an accelerated service and a read operation for a random write operation, the write mirroring unit for providing redundant backup protection function for dirty data in a large cache and a small cache;
  • the data writing unit When the data writing unit performs data writing, if the write operation hits the cache line of the small cache unit, the data is written into the small cache unit, such as the cache unit that misses the small cache unit but hits the large cache unit. Then the data is written into the large cache unit, such as the large cache unit and the small cache unit are missed and the acceleration identifier is valid, then the data is written into the small cache unit, otherwise the data is not written into the flash storage resource and directly written to the back end.
  • Storage cluster When the data writing unit performs data writing, if the write operation hits the cache line of the small cache unit, the data is written into the small cache unit, such as the cache unit that misses the small cache unit but hits the large cache unit. Then the data is written into the large cache unit, such as the large cache unit and the small cache unit are missed and the acceleration identifier is valid, then the data is written into the small cache unit, otherwise the data is not written into the flash storage resource and directly written to the back end.
  • the data reading unit When the data reading unit performs data reading, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned, such as missing the small cache unit but hitting the cache line of the large cache unit. , the data in the large cache unit is returned, if the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the large cache. The cache line of the unit returns the data to the front-end data request unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary identifier is valid, the corresponding small cache unit is read from the back-end storage cluster.
  • the device of the present invention may also have the following technical features:
  • the write mirror unit may be composed of a plurality of logical write mirror subunits.
  • the physical flash storage resource includes more than two physical trays, and the large cache unit, the small cache unit, and the write mirror unit may span the two or more physical trays.
  • the data writing unit writes data to the large cache unit, the small cache unit, and the write mirror unit, the write physical location of the large cache unit is different from the write physical location of the write mirror unit On the physical tray, the physical location of the write of the small cache unit and the physical location of the write of the write mirror unit are also on different physical trays.
  • the single cache line of the small cache unit and the write mirror unit are located in the same physical tray or span more than two physical trays, and a single cache line of the large cache unit is located in the same physical tray or across More than two physical trays.
  • Which physical tray the operation of the data writing unit and the data reading unit falls to is based on the following principle: when a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, and The read and write operations mapped to other physical trays remain unchanged.
  • the cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, the dirty state indicating that data in the cache line is inconsistent with data in the backend storage system, the clean state indicating data in the cache line and The data in the backend storage system is consistent.
  • the invalid state indicates that there is no valid data in the cache line; when the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps when receiving the clean data load request. Go to the clean state; when the cache is dirty, it will jump to the clean state only when the cache line clear request is received; when the cache line is in the clean state, it will jump to the dirty state when receiving the data write request, and the invalidation is received. Jumps to an invalid state on request.
  • the cache line of the small cache unit includes at least a dirty state, a clean state, an invalid state, and a frozen state, where the dirty state indicates that data in the cache line is inconsistent with data in the backend storage system, and the clean state indicates that the cache line is in the cache line.
  • the data is consistent with the data in the backend storage system.
  • the invalid state indicates that there is no valid data in the cache line.
  • the frozen state indicates that the current cache line is in a frozen state and can only be read and cannot be written; when the cache line In the invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request; when the cache is in the dirty state, it jumps to the invalid state when the cache line clear request is received.
  • the move request When the move request is received, it jumps to the frozen state; when the cache line is in the clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when the read request is received; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.
  • a guard unit is further configured to clear dirty data in the write mirror unit to the backend storage cluster in the background to limit dirty data in the flash storage resource that needs to be redundantly backed up within a predetermined range.
  • the redundant backup adopts a write mirroring mode.
  • Advantages of the present invention in comparison with the prior art include: by virtualizing a physical flash storage resource into a flash storage pool, and creating three logical storage units on the storage pool, and the data writing and reading methods employed
  • the non-volatile cache implementation method of the present invention avoids generating a huge cache state table
  • the problem also avoids redundant backup methods that seriously affect write performance, enabling ultra-large capacity and ultra-high performance, which significantly improves the read and write performance of centralized control devices, and provides uninterrupted storage services.
  • FIG. 1 is a schematic diagram showing the overall logical structure of a flash cache of Embodiment 1;
  • Embodiment 2 is a schematic diagram of a centralized distributed storage architecture of Embodiment 1;
  • FIG. 3 is a schematic diagram showing the overall physical structure of a flash cache in Embodiment 1;
  • Embodiment 4 is a cache line simplified state conversion table of a large cache unit in Embodiment 1;
  • Embodiment 5 is a cache line simplified state conversion table of a small cache unit in Embodiment 1;
  • FIG. 6 is a flowchart of a flash buffer write operation in Embodiment 1;
  • FIG. 7 is a flow chart of a flash cache read operation in Embodiment 1;
  • FIG. 9 is a diagram showing an example of correspondence between a flash buffer logic module and a physical module in the first embodiment.
  • the non-volatile memory device (ie, flash memory storage resource) in the cache implementation method disclosed by the present invention includes, but is not limited to, a flash memory, a phase memory, and the like.
  • the storage system connected to the back end of the present invention includes, but is not limited to, the centralized distributed storage system (cluster) given in 203 of FIG. 2, and the following is merely an example of a centralized distributed storage system architecture.
  • the flash cache in the centralized control device needs to have the characteristics of large capacity and ultra high performance (providing high IOPS and low latency) because of the centralized control device connection distribution.
  • the storage capacity of the storage cluster is PB level, and the corresponding cache capacity is hundreds of terabytes.
  • the large-capacity flash cache is in two challenges, namely cache line entry management issues and data consistency issues.
  • the flash memory When the flash memory is used as a cache, it is also required to ensure the consistency between the data in the cache and the data of the back-end distributed storage cluster 203. When the data in the cache and the data in the distributed storage cluster 203 are inconsistent, the data in the cache needs to be backed up. .
  • the most widely used protection method is RAID5/6, but RAID5/6 comes at the expense of huge write performance.
  • Another way is to use only as a read cache. Any write operation is directly written to the backend distributed storage cluster 203, and the related data in the flash cache is put into an invalid state, thereby ensuring that the data in the cache is always stored and backend.
  • the cluster data is consistent and avoids backup protection of the data in the cache, but such an implementation can only speed up partial read operations and cannot speed up write operations. This is the issue of data consistency and its adverse effects.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • a non-volatile cache implementation method first virtualizes a physical flash storage resource into a flash storage pool, and then creates three logical storage units on the storage pool, a large cache unit 101, a small cache unit 102, and a write mirror unit. 103, as shown in Figure 1.
  • the large cache unit 101 is configured to provide a conventional cache service
  • the small cache unit 102 is configured to provide an acceleration service for a random write operation and a data temporary storage service for a read operation
  • the write mirror unit 103 is configured to serve as a large cache unit.
  • the dirty data in the 101 and the small cache unit 102 provides a redundant backup protection function; when the data is written, if the write operation hits the cache line of the small cache unit 102, the data is written to the small cache unit 102, such as a small miss.
  • the cache unit 102 hits the cache line of the large cache unit 101, and writes the data to the large cache unit 101. If the large cache unit 101 and the small cache unit 102 are both missed and the acceleration identifier is valid, the data is written into the small cache unit.
  • the flash storage resource is directly written into the backend storage cluster 203; when the data is read, if the read operation hits the cache line of the small cache unit 102, the data in the small cache unit 102 is returned, such as missing the small cache unit. If the cache line of the large cache unit 101 is hit, the data in the large cache unit 101 is returned. If the large cache unit 101 and the small cache unit 102 are both missed and the acceleration identifier is valid, the read from the backend storage cluster is large.
  • the cache line of the cache unit 101 corresponds to the size data and is loaded into the cache line of the large cache unit 101, and then the data is returned to the virtual machine 201, such as the large cache unit 101 and the small cache unit 102 are missed and the acceleration identifier is invalid but the data is temporarily suspended. If the save flag is valid, the cache line data corresponding to the small cache unit 102 is read from the backend storage cluster and loaded into the cache line of the small cache unit 102, and then the data is returned to the virtual machine 201, otherwise read from the backend storage cluster. Data is sent directly to the frontend virtual machine 201 without going through the flash cache 100.
  • the virtual machine 201 is used as one of the front end data application units.
  • the front end data application unit in the present invention is not limited thereto.
  • each tray provides physical flash storage resources, and internal technologies are used to ensure reliability inside the tray. And stability.
  • the physical flash storage resource is divided into a large cache unit 101 and a small cache unit 102, which can effectively solve the problem that the super-large capacity flash cache status table is too large.
  • the state of the cache line of the large cache unit includes but is not limited to the state listed in FIG.
  • the simplified cache state of the cache unit has three basic states: dirty state, that is, the data in the cache line is inconsistent with the data in the backend storage system 203; the clean state, that is, the data in the cache line and the backend storage system 203 The data is consistent; invalid state, that is, there is no valid data in the cache line.
  • the state jump process is: when the cache line is in an invalid state, if a data write request of the cache line size is received (for example, a write request from the virtual machine 201), the cache line jumps to a dirty state, if received clean
  • a data load request for example, a write request from the storage system 203
  • the cache line jumps to a clean state
  • the cache when the cache is in a dirty state, the state jumps to a clean state only when a cache line clear request is received
  • the cache line When in a clean state, if a data write request is received, the cache line jumps to a dirty state. If a invalidation request is received, the cache line jumps to an invalid state.
  • the state of the small cache unit cache line includes but is not limited to the state listed in FIG. 5.
  • the basic state of the cache line of the small cache unit is simplified.
  • the dirty state that is, the data in the cache line is inconsistent with the data in the backend storage system 203;
  • the clean state that is, the data in the cache line is consistent with the data in the backend storage system 203;
  • the invalid state that is, there is no cache line Valid data;
  • frozen state that is, the current cache line is in a frozen state and can only be read and cannot be written.
  • the state jump process is: when the cache line is in an invalid state, if a data write request is received (for example, a write request from the virtual machine 201), the cache line jumps to a dirty state, if a clean data load request is received (eg, a write request from the storage system 203), the cache line jumps to a clean state; when the cache is in a dirty state, if a cache line clear request is received, the state jump is in an invalid state, if a move request is received, the state The jump is in a frozen state; when the cache line is in a clean state, if a data write request is received, the cache line jumps to a dirty state, and if a read request is received, the cache line jumps to an invalid state; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.
  • the different states and jumps of the large/small cache unit enable accelerated read and write operations.
  • the state of the large cache unit and the small cache unit and the jump are different because their service purposes are different.
  • the large cache unit or the small cache unit is used, depending on the policy prompt information and the large cache.
  • Policy prompt information includes, but is not limited to, service level, hit probability prediction, and the like.
  • the policy prompt information may come directly from the centralized control device 202 or from the virtual machine 201.
  • Status information includes, but is not limited to, whether it is a hit.
  • the large cache unit is used to provide a conventional cache service, and different aging policies can be applied to different cache lines according to the service level;
  • the small cache unit provides a cache acceleration function for the write operation that does not hit the large cache unit for the first time, and Provides data staging for read operations that do not hit large cache units.
  • the cache line of the small cache unit 102 is small, such as 4KByte; the cache line of the large cache unit 101 is large, such as 4 Mbytes; the cache line of the write mirror unit 103 and the cache line of the small cache unit 102 can be kept consistent.
  • the size of the cache line can be adjusted according to the actual situation. For example, the cache line size of the small cache unit 102 is determined according to the storage request status of the virtual machine 201, and the cache line size of the large cache unit 101 is determined according to the implementation of the back-end distributed storage cluster 203. .
  • the smaller_Size is the size of the small cache unit 102
  • the Mirror_size is the size of the write mirror unit 103
  • the Little_granularity is the size of the cache line of the small cache unit 102.
  • the cache line size of the small cache unit 102 and the data access of the virtual machine 201 are The block size is consistent
  • Big_Size is the size of the large cache unit 101
  • Big_granularity is the size of the cache line of the large cache unit 101
  • available_DRAM_Size is the size of the DRAM of the available storage cache status table
  • entry_size is the size of each entry.
  • the write mirror unit 103 provides redundant backup protection for dirty data in the large cache unit 101 and the small cache unit 102.
  • the data from the virtual machine 201 is also written into the write mirror unit 103 while being written to the large cache unit 101 or the small cache unit 102.
  • a preferred practice is to further include a guard unit that is responsible for clearing dirty data in the write mirror unit 103 to the backend storage cluster 203 in the background. Since the write mirror unit 103 only backs up the dirty data in the large cache unit 101 and the small cache unit 102, and the guard unit continuously clears the dirty data into the backend storage cluster 203 according to a predetermined rule, the dirty in the flash cache 100 The data is limited and there is no need to perform redundant backups of all data in the entire flash cache 100. At the same time, the backup strategy adopts the method of writing mirroring, which reduces the performance requirement of redundant backup on the one hand, and accelerates the acceleration of all write operations on the other hand.
  • the guard unit extracts a dirty data and related information (such as address information) from the write image.
  • the flash cache status table is queried according to the related information to obtain the flash cache status. If the cache status indicates that the cache line of the small cache unit 102 is hit, and the cache line of the large cache unit 101 is not hit, the data in the cache line of the small cache unit 102 is directly cleared into the backend storage cluster 203.
  • the cache status indicates that the cache line of the hit cache unit 102 also hits the cache line of the large cache unit 101
  • the data in the cache line of the small cache unit 102 is first moved to the cache line of the large cache unit 101, and then The data in the cache line of the large cache unit 101 is cleared into the backend storage cluster 203. If the cache status indicates that the cache line of the small cache unit 102 is not hit, the cache line of the large cache unit 101 is hit, and The cache line of the large cache unit 101 contains dirty data, and the data in the cache line of the large cache unit 101 is cleared into the backend storage cluster 203.
  • the write mirror unit 103 can be composed of a plurality of logical write mirror subunits, each of which has its own daemon.
  • Each logical unit, large cache unit 101, small cache unit 102, and write mirror unit 103 can span all physical trays.
  • the advantage is to increase the concurrency of each physical tray and improve performance.
  • the write mirror logic unit can be divided into a plurality of small write mirror logic subunits, for example, one logical write mirror subunit on each tray, and the advantage of splitting into multiple small logical write mirror subunits is that multiple writes can be concurrently performed. Mirror the daemon to speed up the removal of dirty data to the backend storage cluster.
  • the physical of writing the large cache unit or the small cache unit is not on the same physical tray.
  • the physical tray number written to the write mirror unit 103 may be a simple rule of writing the physical buffer number of the large cache unit 101 or the small cache unit 102 plus one (not limited thereto). This has the advantage of ensuring that the redundant backup and the original data are on different physical trays, and that the flash cache 100 still has available data available when a single physical tray is damaged.
  • the size of the cache line of the large cache shown in Figure 9 is 4MByte, but in actual use, it can be adjusted according to the actual situation.
  • the single cache line of the small cache unit 102 and the write mirror unit 103 can be located in the same physical tray or across more than two physical trays, and a single cache line of the large cache unit can span multiple physical trays. It can also be located in the same physical tray. In this example, a single cache line of a large cache unit is located in the same physical tray as an example. This method is more convenient to realize the technical effect of providing uninterrupted service when a single physical tray is damaged.
  • the uninterrupted service provision is as shown in the following example.
  • Step 1 First mark tray 0 and tray 1 as not providing free cache line status.
  • the second step: traversal to remove dirty data, the thread is as follows:
  • Thread 1 Traverse the cache line status table of tray 0. If it is in a clean state, it will be invalidated. If it is in a dirty state, the data will be cleared to the backend storage cluster and then invalidated.
  • Thread 2 Traverse the cache line status table of the tray 1, and if it is in a clean state, it will be invalidated, and it will wait until the status becomes clean.
  • Thread 3 Increase the running priority of the write mirror daemon on the tray 2 to the highest level.
  • Threads 1, 2, and 3 are executed concurrently.
  • Step 3 Wait for the tray 0 and 1 traversal to end, then set tray 0 to a state that can provide a free cache line. Because of the new situation, the tray 0 should be double-backed by the write mirror unit on the tray 2.
  • the algorithm for selecting which physical tray from which the virtual machine reads and writes is selected is determined according to the following principle: when a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, and the original The mapping relationship remains unchanged for read and write operations mapped to other physical trays.
  • the inventors have also found that the granularity of the read and write operations from the virtual machine 201 is the same as the cache line size of the small cache unit 102. However, the size of the cache line of the large cache unit 101 is large, so that a large/small cache unit is hit at the same time. It can be solved by the method described below:
  • the query acceleration flag is Valid, if valid, read the data of the large cache line size from the backend storage cluster 203, load it into the cache line of the large cache unit 101, and then return the data to the virtual machine 201. If invalid, the query data is temporarily stored.
  • the identification is valid, if valid, reads the cache line data of the small cache unit 102 from the backend storage cluster 203, loads it into the cache line of the small cache unit 102, and then returns the data to the virtual machine 201, otherwise it stores from the back end.
  • the data read by the cluster is sent to the front-end virtual machine 201 directly through the centralized control device 202 without passing through the flash cache 100.
  • the non-volatile cache implementation method of this embodiment can control the size of the state table of the record buffer state within a certain range, and can accelerate all the write operations in addition to accelerating the read operation. In addition, only some of the data is backed up during backup, the amount of backup data is limited and the backup operation has little impact on performance. Furthermore, there is no hot spare disk to provide uninterrupted service.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the apparatus of this embodiment is consistent with the non-volatile cache implementation method in the foregoing embodiment.
  • a non-volatile cache implementing apparatus includes a flash storage resource virtualization unit, a logical storage unit creation unit, a data writing unit, and a data reading unit.
  • the flash storage resource virtualization unit is configured to virtualize physical flash storage resources into a flash storage pool.
  • the logical storage unit creating unit is configured to create three logical storage units, a large cache unit, a small cache unit and a write mirror unit on the storage pool, the large cache unit is configured to provide a conventional cache service, the small cache The unit is configured to provide a data staging service for the acceleration service and the read operation of the random write operation, and the write mirror unit is configured to provide redundant backup protection function for the dirty data in the large cache and the small cache.
  • the physical flash storage resource preferably may include more than two physical trays, the large cache unit, the small cache unit, and the write mirror unit all spanning the two or more physical trays. And preferably: the small cache list A single cache line of meta and write mirror units are located in the same physical tray, and a single cache line of the large cache unit is located in the same physical tray or spans more than two physical trays.
  • the data writing unit When the data writing unit performs data writing, if the write operation hits the cache line of the small cache unit, the data is written into the small cache unit, such as the cache unit that misses the small cache unit but hits the large cache unit. Then, the data is written into the large cache unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data is written into the small cache unit, otherwise the data is not written into the flash storage resource and directly written to the backend storage cluster. .
  • the physical location of writing of the large cache unit is preferably different from the physical location of writing of the write mirror unit.
  • the physical location of the write of the small cache unit is preferably on a different physical tray than the write physical location of the write mirror unit.
  • the data reading unit When the data reading unit performs data reading, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned, such as the cache line that misses the small cache unit but hits the large cache unit. , the data in the large cache unit is returned, if the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the large cache. The cache line of the unit returns the data to the virtual machine. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary identifier is valid, the cache line corresponding to the small cache unit is read from the backend storage cluster. The data is loaded into the cache line of the small cache unit, and the data is returned to the virtual machine. Otherwise, the data read from the backend storage cluster is directly sent to the front-end virtual machine without passing through the flash storage resource.
  • the size division of the large cache unit, the small cache unit, and the write mirror unit may be performed in various manners, preferably using the following formula
  • Big_Size is the size of the large cache unit
  • Little_Size is the size of a small cache unit.
  • Mirror_size is the size of the write mirror unit.
  • Little_granularity is the size of the cache line for small cache units.
  • Big_granularity is the size of the cache line for large cache units.
  • available_DRAM_Size is the size of the DRAM available for storing the cache status table
  • Entry_size is the size of each entry cached.
  • the write mirror unit may be composed of a plurality of logical write mirror subunits.
  • Which physical tray the operation of the data writing unit and the data reading unit falls to is preferably based on the principle that when a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, The read and write operations mapped to other physical trays remain unchanged.
  • the cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, the dirty state indicating that data in the cache line is inconsistent with data in the backend storage system, the clean state indicating data in the cache line and The data in the backend storage system is consistent.
  • the invalid state indicates that there is no valid data in the cache line; when the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps when receiving the clean data load request. Go to the clean state; when the cache is dirty, it will jump to the clean state only when the cache line clear request is received; when the cache line is in the clean state, it will jump to the dirty state when receiving the data write request, and the invalidation is received. Jumps to an invalid state on request.
  • the cache line of the small cache unit includes at least a dirty state, a clean state, an invalid state, and a frozen state, where the dirty state indicates that data in the cache line is inconsistent with data in the backend storage system, and the clean state indicates that the cache line is in the cache line.
  • the data is consistent with the data in the backend storage system.
  • the invalid state indicates that there is no valid data in the cache line.
  • the frozen state indicates that the current cache line is in a frozen state and can only be read and cannot be written; when the cache line In the invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request; when the cache is in the dirty state, it jumps to the invalid state when the cache line clear request is received.
  • the move request When the move request is received, it jumps to the frozen state; when the cache line is in the clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when the read request is received; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.
  • the redundant backup preferably adopts a write mirroring manner.

Abstract

Disclosed are a method and apparatus for realizing a non-volatile cache. The method for realizing a non-volatile cache being: firstly virtualizing physical flash memory storage resources as a flash memory storage pool, and then creating three kinds of logical storage units, i.e., a big cache unit, a small cache unit and a mirror image writing unit, on the storage pool, wherein the big cache unit is used for providing a conventional caching service, the small cache unit is used for providing an acceleration service for a random write operation and a temporary data storage service for a read operation, and the mirror image writing unit is used for providing a redundant backup protection function for dirty data in a big cache and a small cache. The method for realizing a non-volatile cache of the present invention avoids the problem of creating a huge cache state table, and also prevents a redundant backup method from seriously affecting the writing performance, and can achieve an ultra-large capacity and ultra-high performance, thereby significantly improving the read and write performance of a centralized control device.

Description

一种非易失性缓存实现方法及装置Method and device for implementing non-volatile cache 技术领域Technical field
本发明涉及存储技术领域,特别是涉及一种应用于集中分布式存储架构的集中控制设备中,提高集中控制设备及整个存储系统的存储性能的非易失性缓存实现方法及装置。The present invention relates to the field of storage technologies, and in particular, to a non-volatile cache implementation method and apparatus for improving storage performance of a centralized control device and an entire storage system in a centralized control device for a centralized distributed storage architecture.
背景技术Background technique
随着半导体技术的发展,高速非易失性存储器件(例如闪存)的存储密度越来越高,目前已经作为数据访问加速设备在数据中心中被广泛使用。与机械磁盘相比,非易失性存储器件(例如闪存)具有更快的随机访问速度;与DRAM相比,非易失性存储器件(例如闪存)能够在电源关闭后继续保持数据,并且具有更大的存储密度。With the development of semiconductor technology, high-speed non-volatile memory devices (such as flash memory) have higher storage densities, and have been widely used as data access acceleration devices in data centers. A non-volatile memory device (such as a flash memory) has a faster random access speed than a mechanical disk; a non-volatile memory device (such as a flash memory) can continue to hold data after the power is turned off, and has Greater storage density.
闪存的高存储密度,非易失性,以及高访问速度的特点,使得闪存在存储系统中得到广泛的应用,其中一种应用就是作为存储系统的加速设备。闪存作为存储系统的加速设备,也有多种实现形式。有闪存加速卡,闪存加速存储层,以及闪存加速缓存。The high storage density, non-volatility, and high access speed of flash memory make flash memory widely used in storage systems. One of the applications is as an acceleration device for storage systems. As an acceleration device for storage systems, flash memory can also be implemented in various forms. There are flash accelerator cards, flash acceleration storage tiers, and flash acceleration caches.
闪存作为加速缓存应用时,需要记录缓存的每一个缓存行(Cache Line)的信息。例如缓存对象的地址,缓存的状态(脏,无效,冻结,清除中,装载中等),以及缓存行的老化程度等。缓存行的个数和闪存缓存的大小和IO请求的颗粒度有关。When flash memory is used as an accelerated cache application, it needs to record the information of each cache line (Cache Line) of the cache. For example, the address of the cached object, the state of the cache (dirty, invalid, frozen, cleared, loaded, etc.), and the degree of aging of the cached line. The number of cache lines is related to the size of the flash cache and the granularity of the IO request.
闪存作为加速缓存应用时,还需要保证闪存缓存和后端连接的存储系统(例如分布式存储集群203)之间的数据一致性,即闪存缓存中的数据要和后端连接的存储系统中的数据保持一致。When flash memory is used as an accelerated cache application, it is also necessary to ensure data consistency between the flash cache and the back-end connected storage system (for example, distributed storage cluster 203), that is, the data in the flash cache is connected to the storage system connected to the back end. The data is consistent.
现有的闪存缓存方法中,有仅作为读缓存使用的。有作为读写缓存使用的,但是对写操作的加速效果有限,原因是为了保证数据的可靠性,而使用了对写性能影响很大的冗余备份技术。另一方面,现有的闪存缓存实现中,闪存的容量并 没有达到百TB级别。Among the existing flash cache methods, there is only used as a read cache. It is used as a read-write cache, but the acceleration effect on write operations is limited. In order to ensure data reliability, a redundant backup technology that greatly affects write performance is used. On the other hand, the existing flash cache implementation, the capacity of the flash memory and Did not reach the hundred TB level.
以上背景技术内容的公开仅用于辅助理解本发明的发明构思及技术方案,其并不必然属于本专利申请的现有技术,在没有明确的证据表明上述内容在本专利申请的申请日已经公开的情况下,上述背景技术不应当用于评价本申请的新颖性和创造性。The above disclosure of the present invention is only for assisting in understanding the inventive concept and technical solution of the present invention, and it does not necessarily belong to the prior art of the present patent application, and there is no clear evidence that the above content has been disclosed on the filing date of the present patent application. In the event that the above background art should not be used to evaluate the novelty and inventiveness of the present application.
发明内容Summary of the invention
本发明的目的在于提出一种非易失性缓存实现方法,以解决上述现有技术存在的闪存缓存中存在的缓存行表项管理和数据一致性问题导致的缓存状态表巨大及控制设备的读写性能差的技术问题。The object of the present invention is to provide a non-volatile cache implementation method, which solves the problem that the cache state table management and the data consistency problem caused by the cache cache entry and the data consistency problem existing in the above-mentioned prior art flash cache are large and the control device reads. Write technical problems with poor performance.
为此,本发明提出一种非易失性缓存实现方法,首先将物理的闪存存储资源虚拟化为闪存存储池,然后在所述存储池上创建三种逻辑存储单元,大缓存单元、小缓存单元和写镜像单元,所述大缓存单元用于提供常规的缓存服务,所述小缓存单元用于提供随机写操作的加速服务和读操作的数据暂存服务,所述写镜像单元用于为大缓存单元和小缓存单元中的脏数据提供冗余备份保护功能;To this end, the present invention provides a non-volatile cache implementation method, first virtualizing a physical flash storage resource into a flash storage pool, and then creating three logical storage units, a large cache unit, and a small cache unit on the storage pool. And a write mirror unit for providing a conventional cache service, the small cache unit for providing an acceleration service for a random write operation and a data temporary storage service for a read operation, the write mirror unit being used for Dirty data in the cache unit and the small cache unit provides redundant backup protection;
数据写入时,如该写操作命中了小缓存单元的缓存行,则把数据写入小缓存单元,如未命中小缓存单元但命中了大缓存单元的缓存行,则把数据写入大缓存单元,如大缓存单元和小缓存单元都未命中且加速标识有效,则将数据写入小缓存单元,否则数据不写入闪存存储资源而直接写入后端存储集群;When data is written, if the write operation hits the cache line of the small cache unit, the data is written to the small cache unit. If the cache unit is missed but the cache line of the large cache unit is hit, the data is written to the cache. Units, such as large cache units and small cache units are missed and the acceleration identifier is valid, then the data is written to the small cache unit, otherwise the data is not written to the flash storage resource and directly written to the backend storage cluster;
数据读取时,如该读操作命中了小缓存单元的缓存行,则把小缓存单元中的数据返回,如未命中小缓存单元但是命中了大缓存单元的缓存行,则把大缓存单元中的数据返回,如大缓存单元和小缓存单元都未命中且加速标识有效,则从后端存储集群读取大缓存单元的缓存行对应大小的数据并装载到大缓存单元的缓存行,再把数据返回给前端数据申请单元,如大缓存单元和小缓存单元都未命中且加速标识无效但数据暂存标识有效,则从后端存储集群读取对应小缓存单元的缓存行数据并装载到小缓存单元的缓存行,再把数据返回给前端数据申请单元,否则从后端存储集群读取的数据不经过闪存存储资源而直接送给前端数据申请单元。 When the data is read, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned. If the cache unit is missed but the cache line of the large cache unit is hit, the large cache unit is The data is returned. If the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the cache line of the large cache unit, and then The data is returned to the front-end data requesting unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary storage identifier is valid, the cache line data corresponding to the small cache unit is read from the backend storage cluster and loaded into the small The cache line of the cache unit returns the data to the front-end data request unit. Otherwise, the data read from the back-end storage cluster is directly sent to the front-end data request unit without passing through the flash storage resource.
优选地,本发明的方法还可以具有如下技术特征:Preferably, the method of the present invention may further have the following technical features:
所述大缓存单元、小缓存单元和写镜像单元的大小满足如下公式(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size,其中,Big_Size为大缓存单元的大小,Little_Size为小缓存单元的大小,Mirror_size为写镜像单元的大小,Little_granularity为小缓存单元缓存行的大小,Big_granularity为大缓存单元缓存行的大小,available_DRAM_Size是可用的存储缓存状态表的DRAM的大小,entry_size是缓存每个表项的大小。The size of the large cache unit, the small cache unit, and the write mirror unit satisfy the following formula (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size, where Big_Size is the size of a large cache unit, and Little_Size is a small cache unit. Size, Mirror_size is the size of the write mirror unit, Little_granularity is the size of the small cache unit cache line, Big_granularity is the size of the large cache unit cache line, available_DRAM_Size is the size of the DRAM of the available storage cache status table, entry_size is the cache of each table The size of the item.
所述写镜像单元由至少一个逻辑写镜像子单元组成,所述大缓存单元、小缓存单元分别由至少一个逻辑大缓存子单元、至少一个逻辑小缓存子单元组成。The write mirror unit is composed of at least one logical write mirror subunit, and the large cache unit and the small cache unit are respectively composed of at least one logical large cache subunit and at least one logical small cache subunit.
所述物理的闪存存储资源包括两个以上物理托盘,所述大缓存单元、小缓存单元和写镜像单元均横跨所述两个以上物理托盘。The physical flash storage resource includes more than two physical trays, and the large cache unit, the small cache unit, and the write mirror unit all span the two or more physical trays.
数据在写入所述大缓存单元、小缓存单元和写镜像单元时,所述大缓存单元的写入物理位置与所述写镜像单元的写入物理位置处于不同的所述物理托盘上,所述小缓存单元的写入物理位置与所述写镜像单元的写入物理位置亦处于不同的所述物理托盘上。When the data is written into the large cache unit, the small cache unit, and the write mirror unit, the physical location of the write of the large cache unit is different from the physical location of the write of the write mirror unit. The physical location of the write of the small cache unit and the physical location of the write of the write mirror unit are also on the physical tray.
所述小缓存单元和写镜像单元的单个缓存行位于同一个物理托盘内或横跨两个以上物理托盘,所述大缓存单元的单个缓存行位于同一个物理托盘内或横跨两个以上物理托盘。The single cache line of the small cache unit and the write mirror unit are located in the same physical tray or span more than two physical trays, and a single cache line of the large cache unit is located in the same physical tray or spans more than two physical tray.
数据写入操作和数据读取操作落到哪个物理托盘按以下原则:当某个物理托盘损坏时,仅仅将原来映射到该物理托盘上的操作转移到其他的物理托盘上,而原来就映射到其他物理托盘上的读写操作维持映射关系不变。The physical tray to which the data write operation and the data read operation fall is based on the following principle: When a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, and the original map is mapped to Read and write operations on other physical trays maintain the mapping relationship unchanged.
所述大缓存单元的缓存行至少包括脏状态、干净状态和无效状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中无有效数据;当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数据装载请求时跳转到干净状态;当缓存处于脏状态时,只有收到缓存行清除请求 时跳转为干净状态;当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到失效请求时跳转到无效状态。The cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, the dirty state indicating that data in the cache line is inconsistent with data in the backend storage system, the clean state indicating data in the cache line and The data in the backend storage system is consistent. The invalid state indicates that there is no valid data in the cache line; when the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps when receiving the clean data load request. Go to a clean state; when the cache is dirty, only the cache line clear request is received When the cache line is in a clean state, it jumps to the dirty state when it receives the data write request, and jumps to the invalid state when it receives the invalidation request.
所述小缓存单元的缓存行至少包括脏状态、干净状态、无效状态和冻结状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中没有有效数据,所述冻结状态表示当前缓存行处于冻结状态,只能被读取,不能被写入;当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数据装载请求时跳转到干净状态;当缓存处于脏状态时,收到缓存行清除请求时跳转为无效状态,收到移动请求时跳转为冻结状态;当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到读请求时跳转到无效状态;当缓存行处于冻结状态时,只有收到移动完成的返回时缓存行跳转到无效状态。The cache line of the small cache unit includes at least a dirty state, a clean state, an invalid state, and a frozen state, where the dirty state indicates that data in the cache line is inconsistent with data in the backend storage system, and the clean state indicates that the cache line is in the cache line. The data is consistent with the data in the backend storage system. The invalid state indicates that there is no valid data in the cache line. The frozen state indicates that the current cache line is in a frozen state and can only be read and cannot be written; when the cache line In the invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request; when the cache is in the dirty state, it jumps to the invalid state when the cache line clear request is received. When the move request is received, it jumps to the frozen state; when the cache line is in the clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when the read request is received; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.
还包括守护单元,该守护单元用于在后台将写镜像单元中的脏数据清除到后端存储集群,以限制所述闪存存储资源中需做冗余备份的脏数据在预定的范围内。A guard unit is further configured to clear dirty data in the write mirror unit to the backend storage cluster in the background to limit dirty data in the flash storage resource that needs to be redundantly backed up within a predetermined range.
所述冗余备份采用写镜像方式。The redundant backup adopts a write mirroring mode.
所述物理的闪存存储资源为闪存、相位存储器。The physical flash storage resource is a flash memory, a phase memory.
本发明还提出一种非易失性缓存实现装置,包括:闪存存储资源虚拟化单元,用于将物理的闪存存储资源虚拟化为闪存存储池;The present invention also provides a non-volatile cache implementation apparatus, including: a flash storage resource virtualization unit for virtualizing a physical flash storage resource into a flash storage pool;
逻辑存储单元创建单元,用于在所述存储池上创建三种逻辑存储单元,大缓存单元、小缓存单元和写镜像单元,所述大缓存单元用于提供常规的缓存服务,所述小缓存单元用于提供随机写操作的加速服务和读操作的数据暂存服务,所述写镜像单元用于为大缓存和小缓存中的脏数据提供冗余备份保护功能;a logical storage unit creating unit, configured to create three logical storage units, a large cache unit, a small cache unit and a write mirror unit on the storage pool, wherein the large cache unit is configured to provide a conventional cache service, and the small cache unit a data temporary storage service for providing an accelerated service and a read operation for a random write operation, the write mirroring unit for providing redundant backup protection function for dirty data in a large cache and a small cache;
数据写入单元和数据读取单元;a data writing unit and a data reading unit;
所述数据写入单元进行数据写入时,如该写操作命中了小缓存单元的缓存行,则把数据写入小缓存单元,如未命中小缓存单元但命中了大缓存单元的缓存行,则把数据写入大缓存单元,如大缓存单元和小缓存单元都未命中且加速标识有效,则将数据写入小缓存单元,否则数据不写入闪存存储资源而直接写入后端 存储集群;When the data writing unit performs data writing, if the write operation hits the cache line of the small cache unit, the data is written into the small cache unit, such as the cache unit that misses the small cache unit but hits the large cache unit. Then the data is written into the large cache unit, such as the large cache unit and the small cache unit are missed and the acceleration identifier is valid, then the data is written into the small cache unit, otherwise the data is not written into the flash storage resource and directly written to the back end. Storage cluster
所述数据读取单元进行数据读取时,如该读操作命中了小缓存单元的缓存行,则把小缓存单元中的数据返回,如未命中小缓存单元但是命中了大缓存单元的缓存行,则把大缓存单元中的数据返回,如大缓存单元和小缓存单元都未命中且加速标识有效,则从后端存储集群读取大缓存单元的缓存行对应大小的数据并装载到大缓存单元的缓存行,再把数据返回给前端数据申请单元,如大缓存单元和小缓存单元都未命中且加速标识无效但数据暂存标识有效,则从后端存储集群读取对应小缓存单元的缓存行数据并装载到小缓存单元的缓存行,再把数据返回给前端数据申请单元,否则从后端存储集群读取的数据不经过闪存存储资源而直接送给前端前端数据申请单元。When the data reading unit performs data reading, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned, such as missing the small cache unit but hitting the cache line of the large cache unit. , the data in the large cache unit is returned, if the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the large cache. The cache line of the unit returns the data to the front-end data request unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary identifier is valid, the corresponding small cache unit is read from the back-end storage cluster. Cache the row data and load it into the cache line of the small cache unit, and then return the data to the front-end data request unit. Otherwise, the data read from the back-end storage cluster is directly sent to the front-end front-end data request unit without passing through the flash storage resource.
优选地,本发明的装置还可以具有如下技术特征:Preferably, the device of the present invention may also have the following technical features:
所述大缓存单元、小缓存单元和写镜像单元的大小满足如下公式(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size,其中,Big_Size为大缓存单元的大小,Little_Size为小缓存单元的大小,Mirror_size为写镜像单元的大小,Little_granularity为小缓存单元缓存行的大小,Big_granularity为大缓存单元缓存行的大小,available_DRAM_Size是可用的存储缓存状态表的DRAM的大小,entry_size是缓存每个表项的大小。The size of the large cache unit, the small cache unit, and the write mirror unit satisfy the following formula (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size, where Big_Size is the size of a large cache unit, and Little_Size is a small cache unit. Size, Mirror_size is the size of the write mirror unit, Little_granularity is the size of the small cache unit cache line, Big_granularity is the size of the large cache unit cache line, available_DRAM_Size is the size of the DRAM of the available storage cache status table, entry_size is the cache of each table The size of the item.
所述写镜像单元可由多个逻辑写镜像子单元构成。The write mirror unit may be composed of a plurality of logical write mirror subunits.
所述物理的闪存存储资源包括两个以上物理托盘,所述大缓存单元、小缓存单元和写镜像单元均可以横跨所述两个以上物理托盘。The physical flash storage resource includes more than two physical trays, and the large cache unit, the small cache unit, and the write mirror unit may span the two or more physical trays.
所述数据写入单元在将数据写入所述大缓存单元、小缓存单元和写镜像单元时,所述大缓存单元的写入物理位置与所述写镜像单元的写入物理位置处于不同的物理托盘上,所述小缓存单元的写入物理位置与所述写镜像单元的写入物理位置亦处于不同的物理托盘上。The data writing unit writes data to the large cache unit, the small cache unit, and the write mirror unit, the write physical location of the large cache unit is different from the write physical location of the write mirror unit On the physical tray, the physical location of the write of the small cache unit and the physical location of the write of the write mirror unit are also on different physical trays.
所述小缓存单元和写镜像单元的单个缓存行位于同一个物理托盘内或横跨两个以上物理托盘,所述大缓存单元的单个缓存行位于同一个物理托盘内或横跨 两个以上物理托盘。The single cache line of the small cache unit and the write mirror unit are located in the same physical tray or span more than two physical trays, and a single cache line of the large cache unit is located in the same physical tray or across More than two physical trays.
所述数据写入单元和数据读取单元的操作落到哪个物理托盘按以下原则:当某个物理托盘损坏时,仅仅将原来映射到该物理托盘上的操作转移到其他的物理托盘上,而原来就映射到其他物理托盘上的读写操作维持映射关系不变。Which physical tray the operation of the data writing unit and the data reading unit falls to is based on the following principle: when a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, and The read and write operations mapped to other physical trays remain unchanged.
所述大缓存单元的缓存行至少包括脏状态、干净状态和无效状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中无有效数据;当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数据装载请求时跳转到干净状态;当缓存处于脏状态时,只有收到缓存行清除请求时跳转为干净状态;当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到失效请求时跳转到无效状态。The cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, the dirty state indicating that data in the cache line is inconsistent with data in the backend storage system, the clean state indicating data in the cache line and The data in the backend storage system is consistent. The invalid state indicates that there is no valid data in the cache line; when the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps when receiving the clean data load request. Go to the clean state; when the cache is dirty, it will jump to the clean state only when the cache line clear request is received; when the cache line is in the clean state, it will jump to the dirty state when receiving the data write request, and the invalidation is received. Jumps to an invalid state on request.
所述小缓存单元的缓存行至少包括脏状态、干净状态、无效状态和冻结状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中没有有效数据,所述冻结状态表示当前缓存行处于冻结状态,只能被读取,不能被写入;当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数据装载请求时跳转到干净状态;当缓存处于脏状态时,收到缓存行清除请求时跳转为无效状态,收到移动请求时跳转为冻结状态;当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到读请求时跳转到无效状态;当缓存行处于冻结状态时,只有收到移动完成的返回时缓存行跳转到无效状态。The cache line of the small cache unit includes at least a dirty state, a clean state, an invalid state, and a frozen state, where the dirty state indicates that data in the cache line is inconsistent with data in the backend storage system, and the clean state indicates that the cache line is in the cache line. The data is consistent with the data in the backend storage system. The invalid state indicates that there is no valid data in the cache line. The frozen state indicates that the current cache line is in a frozen state and can only be read and cannot be written; when the cache line In the invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request; when the cache is in the dirty state, it jumps to the invalid state when the cache line clear request is received. When the move request is received, it jumps to the frozen state; when the cache line is in the clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when the read request is received; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.
还包括守护单元,该守护单元用于在后台将写镜像单元中的脏数据清除到后端存储集群,以限制所述闪存存储资源中需做冗余备份的脏数据在预定的范围内。A guard unit is further configured to clear dirty data in the write mirror unit to the backend storage cluster in the background to limit dirty data in the flash storage resource that needs to be redundantly backed up within a predetermined range.
所述冗余备份采用写镜像方式。The redundant backup adopts a write mirroring mode.
本发明与现有技术对比的有益效果包括:通过将物理的闪存存储资源虚拟化为闪存存储池,并在所述存储池上创建三种逻辑存储单元,以及所采用的数据写入和读取方法,本发明的非易失性缓存实现方法避免了产生巨大的缓存状态表问 题,也避免了严重影响写性能的冗余备份方式,能够做到超大容量和超高性能,从而显著提高了集中控制设备的读写性能,并能够做到不间断提供存储服务。Advantages of the present invention in comparison with the prior art include: by virtualizing a physical flash storage resource into a flash storage pool, and creating three logical storage units on the storage pool, and the data writing and reading methods employed The non-volatile cache implementation method of the present invention avoids generating a huge cache state table The problem also avoids redundant backup methods that seriously affect write performance, enabling ultra-large capacity and ultra-high performance, which significantly improves the read and write performance of centralized control devices, and provides uninterrupted storage services.
附图说明DRAWINGS
图1是实施例1的闪存缓存的整体逻辑结构示意图;1 is a schematic diagram showing the overall logical structure of a flash cache of Embodiment 1;
图2是实施例1的集中分布式存储架构示意图;2 is a schematic diagram of a centralized distributed storage architecture of Embodiment 1;
图3是实施例1中闪存缓存的整体物理结构示意图;3 is a schematic diagram showing the overall physical structure of a flash cache in Embodiment 1;
图4是实施例1中大缓存单元的缓存行简化状态转换表;4 is a cache line simplified state conversion table of a large cache unit in Embodiment 1;
图5是实施例1中小缓存单元的缓存行简化状态转换表;5 is a cache line simplified state conversion table of a small cache unit in Embodiment 1;
图6是实施例1中闪存缓存写入操作流程图;6 is a flowchart of a flash buffer write operation in Embodiment 1;
图7是实施例1中闪存缓存读取操作的一个流程图;7 is a flow chart of a flash cache read operation in Embodiment 1;
图8是实施例1中闪存缓存读取操作的又一个流程图;8 is another flow chart of the flash cache read operation in Embodiment 1;
图9是实施例1中闪存缓存逻辑模块和物理模块对应例图。FIG. 9 is a diagram showing an example of correspondence between a flash buffer logic module and a physical module in the first embodiment.
具体实施方式detailed description
本发明公开的缓存实现方法中的非易失性存储器件(即闪存存储资源)包括但不限于闪存、相位存储器等。本发明后端所接的存储系统包括但不限于图2中203给出的集中分布式存储系统(集群),以下仅是以集中分布式存储系统架构为例对本发明进行说明。The non-volatile memory device (ie, flash memory storage resource) in the cache implementation method disclosed by the present invention includes, but is not limited to, a flash memory, a phase memory, and the like. The storage system connected to the back end of the present invention includes, but is not limited to, the centralized distributed storage system (cluster) given in 203 of FIG. 2, and the following is merely an example of a centralized distributed storage system architecture.
图2所示的集中分布式存储系统架构中,集中控制设备中的闪存缓存需要具备超大容量、超高性能(是指提供高IOPS以及低延迟)的特点这是因为:集中控制设备连接的分布存储集群的存储容量为PB级别,相对应的缓存容量为上百TB级别的。但是,超大容量的闪存缓存在两个难题,即缓存行表项管理问题和数据一致性问题。In the centralized distributed storage system architecture shown in Figure 2, the flash cache in the centralized control device needs to have the characteristics of large capacity and ultra high performance (providing high IOPS and low latency) because of the centralized control device connection distribution. The storage capacity of the storage cluster is PB level, and the corresponding cache capacity is hundreds of terabytes. However, the large-capacity flash cache is in two challenges, namely cache line entry management issues and data consistency issues.
闪存作为缓存使用时,需要将整个存储资源按照一定的颗粒度分解为很多缓存行(cache line),针对每个缓存行,都需要记录这个缓存行的相关信息,如缓存行存放的数据来自哪里,缓存行当前的状态等,当闪存缓存的容量达到上百TB byte,例如200T Byte时,如果按照4K Byte的颗粒度来划分缓存行,则一共有200TB/4KB=50×109个缓存行,假设每个缓存行需要16Byte来记录其状态, 那么就一共需要800GByte的表来标识记录整个闪存缓存的状态,这是一个巨大的并且是不可承受的表。而4KByte的颗粒度是由虚拟机201决定的,即作为虚拟机201的块存储设备,存储数据的块访问单元就是4KByte的。这就会产生巨大的缓存状态表即缓存行表项管理问题。When the flash memory is used as a cache, the entire storage resource needs to be decomposed into a plurality of cache lines according to a certain granularity. For each cache line, information about the cache line needs to be recorded, such as where the data stored in the cache line comes from. , the current state of the cache line, etc., when the capacity of the flash cache reaches hundreds of terabytes, for example, 200T Byte, if the cache line is divided according to the granularity of 4K Byte, there are a total of 200TB/4KB=50×10 9 cache lines. Assuming that each cache line requires 16 Bytes to record its state, then a total of 800 GByte tables are needed to identify the state of the entire flash cache, which is a huge and unacceptable table. The granularity of 4KByte is determined by the virtual machine 201, that is, as the block storage device of the virtual machine 201, the block access unit for storing data is 4KByte. This will result in a huge cache state table, that is, cache row entry management issues.
闪存作为缓存使用时,还需要保证缓存中的数据和后端分布存储集群203数据的一致性,当缓存中的数据和分布存储集群203中的数据不一致时,需要对缓存中的数据做备份保护。目前被采用最多的保护方式就是RAID5/6,但是RAID5/6是以巨大的写性能牺牲为代价的。另外一种方式就是仅作为读缓存使用,任何的写操作都直接写入后端分布存储集群203,并且将闪存缓存中的相关数据置为无效状态,从而保证缓存中的数据永远和后端存储集群数据保持一致,避免对缓存中的数据进行备份保护,但是这样的实现方式只能针对部分读操作进行加速,并且不能对写操作进行加速。这就是数据一致性问题,及其带来的不利影响。When the flash memory is used as a cache, it is also required to ensure the consistency between the data in the cache and the data of the back-end distributed storage cluster 203. When the data in the cache and the data in the distributed storage cluster 203 are inconsistent, the data in the cache needs to be backed up. . The most widely used protection method is RAID5/6, but RAID5/6 comes at the expense of huge write performance. Another way is to use only as a read cache. Any write operation is directly written to the backend distributed storage cluster 203, and the related data in the flash cache is put into an invalid state, thereby ensuring that the data in the cache is always stored and backend. The cluster data is consistent and avoids backup protection of the data in the cache, but such an implementation can only speed up partial read operations and cannot speed up write operations. This is the issue of data consistency and its adverse effects.
下面结合具体实施方式并对照附图对本发明作进一步详细说明。应该强调的是,下述说明仅仅是示例性的,而不是为了限制本发明的范围及其应用。The present invention will be further described in detail below in conjunction with the specific embodiments and with reference to the accompanying drawings. It is to be understood that the following description is only illustrative, and is not intended to limit the scope of the invention.
参照以下附图1-9,将描述非限制性和非排他性的实施例,其中相同的附图标记表示相同的部件,除非另外特别说明。Non-limiting and non-exclusive embodiments will be described with reference to the following Figures 1-9, wherein like reference numerals refer to the like parts unless otherwise specified.
实施例一:Embodiment 1:
一种非易失性缓存实现方法,首先将物理的闪存存储资源虚拟化为闪存存储池,然后在所述存储池上创建三种逻辑存储单元,大缓存单元101、小缓存单元102和写镜像单元103,如图1所示。所述大缓存单元101用于提供常规的缓存服务,所述小缓存单元102用于提供随机写操作的加速服务和读操作的数据暂存服务,所述写镜像单元103用于为大缓存单元101和小缓存单元102中的脏数据提供冗余备份保护功能;数据写入时,如该写操作命中了小缓存单元102的缓存行,则把数据写入小缓存单元102,如未命中小缓存单元102但命中了大缓存单元101的缓存行,则把数据写入大缓存单元101,如大缓存单元101和小缓存单元102都未命中且加速标识有效,则将数据写入小缓存单元102,否则数据不写 入闪存存储资源而直接写入后端存储集群203;数据读取时,如该读操作命中了小缓存单元102的缓存行,则把小缓存单元102中的数据返回,如未命中小缓存单元102但是命中了大缓存单元101的缓存行,则把大缓存单元101中的数据返回,如大缓存单元101和小缓存单元102都未命中且加速标识有效,则从后端存储集群读取大缓存单元101的缓存行对应大小的数据并装载到大缓存单元101的缓存行,再把数据返回给虚拟机201,如大缓存单元101和小缓存单元102都未命中且加速标识无效但数据暂存标识有效,则从后端存储集群读取对应小缓存单元102的缓存行数据并装载到小缓存单元102的缓存行,再把数据返回给虚拟机201,否则从后端存储集群读取的数据不经过闪存缓存100而直接送给前端虚拟机201。其中,所述虚拟机201作为前端数据申请单元的一种,仅为举例,本发明中的前端数据申请单元并不局限于此。A non-volatile cache implementation method first virtualizes a physical flash storage resource into a flash storage pool, and then creates three logical storage units on the storage pool, a large cache unit 101, a small cache unit 102, and a write mirror unit. 103, as shown in Figure 1. The large cache unit 101 is configured to provide a conventional cache service, the small cache unit 102 is configured to provide an acceleration service for a random write operation and a data temporary storage service for a read operation, and the write mirror unit 103 is configured to serve as a large cache unit. The dirty data in the 101 and the small cache unit 102 provides a redundant backup protection function; when the data is written, if the write operation hits the cache line of the small cache unit 102, the data is written to the small cache unit 102, such as a small miss. The cache unit 102 hits the cache line of the large cache unit 101, and writes the data to the large cache unit 101. If the large cache unit 101 and the small cache unit 102 are both missed and the acceleration identifier is valid, the data is written into the small cache unit. 102, otherwise the data is not written The flash storage resource is directly written into the backend storage cluster 203; when the data is read, if the read operation hits the cache line of the small cache unit 102, the data in the small cache unit 102 is returned, such as missing the small cache unit. If the cache line of the large cache unit 101 is hit, the data in the large cache unit 101 is returned. If the large cache unit 101 and the small cache unit 102 are both missed and the acceleration identifier is valid, the read from the backend storage cluster is large. The cache line of the cache unit 101 corresponds to the size data and is loaded into the cache line of the large cache unit 101, and then the data is returned to the virtual machine 201, such as the large cache unit 101 and the small cache unit 102 are missed and the acceleration identifier is invalid but the data is temporarily suspended. If the save flag is valid, the cache line data corresponding to the small cache unit 102 is read from the backend storage cluster and loaded into the cache line of the small cache unit 102, and then the data is returned to the virtual machine 201, otherwise read from the backend storage cluster. Data is sent directly to the frontend virtual machine 201 without going through the flash cache 100. The virtual machine 201 is used as one of the front end data application units. The front end data application unit in the present invention is not limited thereto.
本实施例中,所述物理的闪存存储资源(或称闪存缓存100)的结构示意图如图3所示,每个托盘提供物理的闪存存储资源,并且内部采用相应的技术保证托盘内部的可靠性和稳定性。将物理闪存存储资源划分为大缓存单元101和小缓存单元102,可以有效的解决超大容量闪存缓存状态表过大的问题。In this embodiment, a schematic structural diagram of the physical flash storage resource (or the flash cache 100) is shown in FIG. 3, each tray provides physical flash storage resources, and internal technologies are used to ensure reliability inside the tray. And stability. The physical flash storage resource is divided into a large cache unit 101 and a small cache unit 102, which can effectively solve the problem that the super-large capacity flash cache status table is too large.
如图4所示,为大缓存单元的缓存行状态表举例,即大缓存单元的缓存行的状态包括但不限于图4中列出的状态。大缓存单元的缓存行简化后的基本状态有三个:脏状态,即缓存行中的数据和后端存储系统203中的数据不一致;干净状态,即缓存行中的数据和后端存储系统203中的数据一致;无效状态,即缓存行中没有有效数据。状态跳转过程为:当缓存行处于无效状态时,如果收到缓存行大小的数据写入请求时(例如来自虚拟机201的写入请求),缓存行跳转到脏状态,如果收到干净数据装载请求时(例如来自存储系统203的写入请求),缓存行跳转到干净状态;当缓存处于脏状态时,只有收到缓存行清除请求时,状态跳转为干净状态;当缓存行处于干净状态时,如果收到数据写入请求,缓存行跳转到脏状态,如果收到失效请求,缓存行跳转到无效状态。As shown in FIG. 4, for the cache line state table of the large cache unit, the state of the cache line of the large cache unit includes but is not limited to the state listed in FIG. The simplified cache state of the cache unit has three basic states: dirty state, that is, the data in the cache line is inconsistent with the data in the backend storage system 203; the clean state, that is, the data in the cache line and the backend storage system 203 The data is consistent; invalid state, that is, there is no valid data in the cache line. The state jump process is: when the cache line is in an invalid state, if a data write request of the cache line size is received (for example, a write request from the virtual machine 201), the cache line jumps to a dirty state, if received clean When a data load request (for example, a write request from the storage system 203), the cache line jumps to a clean state; when the cache is in a dirty state, the state jumps to a clean state only when a cache line clear request is received; when the cache line When in a clean state, if a data write request is received, the cache line jumps to a dirty state. If a invalidation request is received, the cache line jumps to an invalid state.
如图5所示,为小缓存单元的缓存行状态表举例,即小缓存单元缓存行的状态包括但不限于图5中列出的状态。小缓存单元的缓存行简化后的基本状态有四 个:脏状态,即缓存行中的数据和后端存储系统203中的数据不一致;干净状态,即缓存行中的数据和后端存储系统203中的数据一致;无效状态,即缓存行中没有有效数据;冻结状态,即当前缓存行处于冻结状态,只能被读取,不能被写入。状态跳转过程为:当缓存行处于无效状态时,如果收到数据写入请求时(例如来自虚拟机201的写入请求),缓存行跳转到脏状态,如果收到干净数据装载请求时(例如来自存储系统203的写入请求),缓存行跳转到干净状态;当缓存处于脏状态时,如果收到缓存行清除请求时,状态跳转为无效状态,如果收到移动请求,状态跳转为冻结状态;当缓存行处于干净状态时,如果收到数据写入请求,缓存行跳转到脏状态,如果收到读请求,缓存行跳转到无效状态;当缓存行处于冻结状态时,只有收到移动完成的返回,缓存行跳转到无效状态。As shown in FIG. 5, for the cache line state table of the small cache unit, the state of the small cache unit cache line includes but is not limited to the state listed in FIG. 5. The basic state of the cache line of the small cache unit is simplified. The dirty state, that is, the data in the cache line is inconsistent with the data in the backend storage system 203; the clean state, that is, the data in the cache line is consistent with the data in the backend storage system 203; the invalid state, that is, there is no cache line Valid data; frozen state, that is, the current cache line is in a frozen state and can only be read and cannot be written. The state jump process is: when the cache line is in an invalid state, if a data write request is received (for example, a write request from the virtual machine 201), the cache line jumps to a dirty state, if a clean data load request is received (eg, a write request from the storage system 203), the cache line jumps to a clean state; when the cache is in a dirty state, if a cache line clear request is received, the state jump is in an invalid state, if a move request is received, the state The jump is in a frozen state; when the cache line is in a clean state, if a data write request is received, the cache line jumps to a dirty state, and if a read request is received, the cache line jumps to an invalid state; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.
大/小缓存单元不同的状态及跳转实现了加速读操作和写操作。本例中,大缓存单元和小缓存单元状态以及跳转的不同是因为它们的服务目的不同,对读写访问进行缓存加速时使用大缓存单元还是小缓存单元,取决于策略提示信息以及大缓存单元和小缓存单元的状态信息。策略提示信息包括但不限于服务等级、命中概率预测等。策略提示信息可以直接来自于集中控制设备202,也可以来自于虚拟机201。状态信息包括但不限于是否命中。本例中,大缓存单元用于提供常规的缓存服务,并可根据服务等级对不同的缓存行采用不同的老化策略;小缓存单元为首次没有命中大缓存单元的写操作提供缓存加速功能,并为没有命中大缓存单元的读操作提供数据暂存功能。The different states and jumps of the large/small cache unit enable accelerated read and write operations. In this example, the state of the large cache unit and the small cache unit and the jump are different because their service purposes are different. When the cache is accelerated for read/write access, the large cache unit or the small cache unit is used, depending on the policy prompt information and the large cache. Status information for units and small cache units. Policy prompt information includes, but is not limited to, service level, hit probability prediction, and the like. The policy prompt information may come directly from the centralized control device 202 or from the virtual machine 201. Status information includes, but is not limited to, whether it is a hit. In this example, the large cache unit is used to provide a conventional cache service, and different aging policies can be applied to different cache lines according to the service level; the small cache unit provides a cache acceleration function for the write operation that does not hit the large cache unit for the first time, and Provides data staging for read operations that do not hit large cache units.
小缓存单元102的缓存行是小的,譬如是4KByte;大缓存单元101的缓存行是大的,譬如是4Mbytes;写镜像单元103的缓存行和小缓存单元102的缓存行可保持一致。具体的缓存行的大小可以根据实际情况进行调整,例如根据虚拟机201存储请求情况决定小缓存单元102的缓存行大小,根据后端分布存储集群203的实现情况决定大缓存单元101的缓存行大小。The cache line of the small cache unit 102 is small, such as 4KByte; the cache line of the large cache unit 101 is large, such as 4 Mbytes; the cache line of the write mirror unit 103 and the cache line of the small cache unit 102 can be kept consistent. The size of the cache line can be adjusted according to the actual situation. For example, the cache line size of the small cache unit 102 is determined according to the storage request status of the virtual machine 201, and the cache line size of the large cache unit 101 is determined according to the implementation of the back-end distributed storage cluster 203. .
而小缓存单元102、大缓存单元101和写镜像单元103的大小及相互关系,可以根据集中控制设备202中的DRAM资源确定。例如,需要将所有记录缓存状态的表都放入到集中控制设备202相应的DRAM资源中,那么就要满足 (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size。其中,Little_Size就是小缓存单元102的大小,Mirror_size是写镜像单元103的大小,Little_granularity是小缓存单元102的缓存行的大小,本实施例中小缓存单元102的缓存行大小与虚拟机201数据访问的块大小保持一致,Big_Size是大缓存单元101的大小,Big_granularity是大缓存单元101缓存行的大小,available_DRAM_Size是可用的存储缓存状态表的DRAM的大小,entry_size是缓存每个表项的大小。The size and relationship of the small cache unit 102, the large cache unit 101, and the write mirror unit 103 can be determined according to the DRAM resources in the centralized control device 202. For example, if all the records of the record cache state need to be placed in the corresponding DRAM resources of the centralized control device 202, then it is necessary to satisfy (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size. The smaller_Size is the size of the small cache unit 102, the Mirror_size is the size of the write mirror unit 103, and the Little_granularity is the size of the cache line of the small cache unit 102. In this embodiment, the cache line size of the small cache unit 102 and the data access of the virtual machine 201 are The block size is consistent, Big_Size is the size of the large cache unit 101, Big_granularity is the size of the cache line of the large cache unit 101, available_DRAM_Size is the size of the DRAM of the available storage cache status table, and entry_size is the size of each entry.
写镜像单元103为大缓存单元101和小缓存单元102中的脏数据提供冗余备份保护功能。来自虚拟机201的数据在写入大缓存单元101或者小缓存单元102的同时,也被写入写镜像单元103中。The write mirror unit 103 provides redundant backup protection for dirty data in the large cache unit 101 and the small cache unit 102. The data from the virtual machine 201 is also written into the write mirror unit 103 while being written to the large cache unit 101 or the small cache unit 102.
一个较佳的做法是,还进一步包括一个守护单元,其负责在后台将写镜像单元103中的脏数据清除到后端存储集群203。因为写镜像单元103仅备份大缓存单元101和小缓存单元102中的脏数据,并且有守护单元不断的将脏数据按预定的规则清除到后端存储集群203中,所以闪存缓存100中的脏数据是有限的,不需要对整个闪存缓存100中的所有数据做冗余备份。同时,备份策略采用写镜像的方式,一方面降低了冗余备份对性能的要求,另一方面实现了对所有写操作进行加速的效果。A preferred practice is to further include a guard unit that is responsible for clearing dirty data in the write mirror unit 103 to the backend storage cluster 203 in the background. Since the write mirror unit 103 only backs up the dirty data in the large cache unit 101 and the small cache unit 102, and the guard unit continuously clears the dirty data into the backend storage cluster 203 according to a predetermined rule, the dirty in the flash cache 100 The data is limited and there is no need to perform redundant backups of all data in the entire flash cache 100. At the same time, the backup strategy adopts the method of writing mirroring, which reduces the performance requirement of redundant backup on the one hand, and accelerates the acceleration of all write operations on the other hand.
如图8所示,为守护单元的处理流程,首先检测写镜像单元103的情况,当写镜像单元103为非空,守护单元从写镜像中取出一个脏数据及相关信息(例如地址信息),根据相关信息查询闪存缓存状态表,获得闪存缓存状态。如果缓存状态显示命中小缓存单元102的缓存行,没有命中大缓存单元101的缓存行,则直接将小缓存单元102的缓存行中的数据清除到后端存储集群203中。如果缓存状态显示既命中小缓存单元102的缓存行也命中大缓存单元101的缓存行,则首先将小缓存单元102的缓存行中的数据搬移到大缓存单元101的缓存行中,然后再将大缓存单元101的缓存行中的数据清除到后端存储集群203中。如果缓存状态显示没有命中小缓存单元102的缓存行,命中大缓存单元101的缓存行,并且 大缓存单元101的缓存行含有脏数据,则将大缓存单元101的缓存行中的数据清除到后端存储集群203中。如果缓存状态显示没有命中小缓存单元102的缓存行,命中大缓存单元101的缓存行,并且大缓存单元101的缓存行中没有脏数据,则不需要对大/小缓存单元做任何操作。值得说明的是,此处所述仅为举例,本流程还可以根据状态信息的变化进行相应的修改。同时,写镜像单元103可由多个逻辑写镜像子单元构成,每个小写镜像子单元都有自己的守护程序。As shown in FIG. 8, for the processing flow of the guard unit, first, the case of the write mirror unit 103 is detected. When the write mirror unit 103 is non-empty, the guard unit extracts a dirty data and related information (such as address information) from the write image. The flash cache status table is queried according to the related information to obtain the flash cache status. If the cache status indicates that the cache line of the small cache unit 102 is hit, and the cache line of the large cache unit 101 is not hit, the data in the cache line of the small cache unit 102 is directly cleared into the backend storage cluster 203. If the cache status indicates that the cache line of the hit cache unit 102 also hits the cache line of the large cache unit 101, the data in the cache line of the small cache unit 102 is first moved to the cache line of the large cache unit 101, and then The data in the cache line of the large cache unit 101 is cleared into the backend storage cluster 203. If the cache status indicates that the cache line of the small cache unit 102 is not hit, the cache line of the large cache unit 101 is hit, and The cache line of the large cache unit 101 contains dirty data, and the data in the cache line of the large cache unit 101 is cleared into the backend storage cluster 203. If the cache status indicates that the cache line of the small cache unit 102 is not hit, the cache line of the large cache unit 101 is hit, and there is no dirty data in the cache line of the large cache unit 101, no operation is required for the large/small cache unit. It should be noted that the description herein is only an example, and the process may also be modified according to the change of the state information. At the same time, the write mirror unit 103 can be composed of a plurality of logical write mirror subunits, each of which has its own daemon.
缓存逻辑单元与物理单元(物理托盘)的对应关系示意如图9所示,每个逻辑单元、大缓存单元101、小缓存单元102、写镜像单元103都可以横跨所有物理托盘,这样做的好处是可提高各个物理托盘的并发程度,提高性能。写镜像逻辑单元可以被分割为多个小的写镜像逻辑子单元,例如每个托盘上有一个逻辑写镜像子单元,分割为多个小的逻辑写镜像子单元的好处是可以并发多个写镜像守护单元,提高将脏数据清除到后端存储集群的速度。The corresponding relationship between the cache logical unit and the physical unit (physical tray) is shown in FIG. 9. Each logical unit, large cache unit 101, small cache unit 102, and write mirror unit 103 can span all physical trays. The advantage is to increase the concurrency of each physical tray and improve performance. The write mirror logic unit can be divided into a plurality of small write mirror logic subunits, for example, one logical write mirror subunit on each tray, and the advantage of splitting into multiple small logical write mirror subunits is that multiple writes can be concurrently performed. Mirror the daemon to speed up the removal of dirty data to the backend storage cluster.
如图9所示,来自虚拟机201的新写入数据在写入大缓存单元101、小缓存单元102和写镜像单元103时,可按以下原则:写入大缓存单元或者小缓存单元的物理位置和写入写镜像单元103的物理位置不处于同一个物理托盘上。例如写入写镜像单元103的物理托盘序号可以是写入大缓存单元101或者小缓存单元102的物理托盘序号加一这样一个简单规则(不局限于此)。这样做的好处是可保证冗余备份和原数据处于不同的物理托盘上,在单个物理托盘损坏时,闪存缓存100仍然有可用的数据提供。图9中给出的大缓存的缓存行的大小为4MByte,但实际使用中,可以根据实际情况进行调整。As shown in FIG. 9, when the new write data from the virtual machine 201 is written to the large cache unit 101, the small cache unit 102, and the write mirror unit 103, the following principle can be applied: the physical of writing the large cache unit or the small cache unit. The physical location of the location and write write mirror unit 103 is not on the same physical tray. For example, the physical tray number written to the write mirror unit 103 may be a simple rule of writing the physical buffer number of the large cache unit 101 or the small cache unit 102 plus one (not limited thereto). This has the advantage of ensuring that the redundant backup and the original data are on different physical trays, and that the flash cache 100 still has available data available when a single physical tray is damaged. The size of the cache line of the large cache shown in Figure 9 is 4MByte, but in actual use, it can be adjusted according to the actual situation.
所述小缓存单元102和写镜像单元103的单个缓存行均可位于同一个物理托盘内或横跨两个以上物理托盘,所述大缓存单元的单个缓存行既可以横跨多个物理托盘,也可以位于同一个物理托盘内。本例中以大缓存单元的单个缓存行位于同一个物理托盘内为例进行说明,该方式更加便于实现单个物理托盘损坏时,仍然能够提供不间断服务的技术效果。The single cache line of the small cache unit 102 and the write mirror unit 103 can be located in the same physical tray or across more than two physical trays, and a single cache line of the large cache unit can span multiple physical trays. It can also be located in the same physical tray. In this example, a single cache line of a large cache unit is located in the same physical tray as an example. This method is more convenient to realize the technical effect of providing uninterrupted service when a single physical tray is damaged.
基于大缓存单元、小缓存单元及写镜像单元分割模式下,在其中一个物理托盘损坏的情况下,不间断服务提供方式如下例所示。 In the case of a large cache unit, a small cache unit, and a write mirror unit split mode, in the case where one of the physical trays is damaged, the uninterrupted service provision is as shown in the following example.
假如图9中托盘1损坏,不能提供服务,并且托盘1上的写镜像备份的是在托盘0上的脏数据,数据恢复及不间断服务提供过程如下:If the tray 1 is damaged as shown in Fig. 9, the service cannot be provided, and the write image on the tray 1 backs up the dirty data on the tray 0. The data recovery and the uninterrupted service provision process are as follows:
第一步:首先将托盘0和托盘1标志为不可提供空闲缓存行状态。Step 1: First mark tray 0 and tray 1 as not providing free cache line status.
第二步:遍历清除脏数据,线程如下:The second step: traversal to remove dirty data, the thread is as follows:
线程1:遍历托盘0的缓存行状态表,处于干净状态的就将其失效,处于脏状态的就将数据清除到后端存储集群,再将其失效。Thread 1: Traverse the cache line status table of tray 0. If it is in a clean state, it will be invalidated. If it is in a dirty state, the data will be cleared to the backend storage cluster and then invalidated.
线程2:遍历托盘1的缓存行状态表,处于干净状态的就将其失效,处于脏状态就等待至状态变为干净状态。Thread 2: Traverse the cache line status table of the tray 1, and if it is in a clean state, it will be invalidated, and it will wait until the status becomes clean.
线程3:提高托盘2上写镜像守护单元的运行优先级为最高级。Thread 3: Increase the running priority of the write mirror daemon on the tray 2 to the highest level.
线程1、2和3是并发执行的。 Threads 1, 2, and 3 are executed concurrently.
第三步:等待托盘0和1遍历都结束后,再将托盘0置为可以提供空闲缓存行的状态。因为新的情况下,托盘0要用托盘2上的写镜像单元做双备份。Step 3: Wait for the tray 0 and 1 traversal to end, then set tray 0 to a state that can provide a free cache line. Because of the new situation, the tray 0 should be double-backed by the write mirror unit on the tray 2.
来自虚拟机的读写操作落到哪个物理托盘可选择的算法按照以下原则确定:当某个物理托盘损坏时,仅仅将原来映射到这个物理托盘上的操作转移到其他的物理托盘上,而原来就映射到其他物理托盘上的读写操作维持映射关系不变。满足这个要求的算法目前有很多,例如CRUSH算法等。The algorithm for selecting which physical tray from which the virtual machine reads and writes is selected is determined according to the following principle: when a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, and the original The mapping relationship remains unchanged for read and write operations mapped to other physical trays. There are many algorithms that meet this requirement, such as the CRUSH algorithm.
在解决了本发明的基本技术问题即缓存行表项管理问题和数据一致性问题外,发明人还发现,由于来自虚拟机201的读写操作的颗粒度和小缓存单元102的缓存行大小一致,但大缓存单元101的缓存行的大小较大,因此会出现同时命中大/小缓存单元的情况。可通过以下描述的方法予以解决:In addition to solving the basic technical problem of the present invention, that is, the cache line entry management problem and the data consistency problem, the inventors have also found that the granularity of the read and write operations from the virtual machine 201 is the same as the cache line size of the small cache unit 102. However, the size of the cache line of the large cache unit 101 is large, so that a large/small cache unit is hit at the same time. It can be solved by the method described below:
如图2和图6所示,当来自虚拟机201的写操作被送到闪存缓存100时,如果这个写操作命中了小缓存单元102的缓存行,则把数据写入小缓存单元102,如果没有命中小缓存单元102的缓存行,但是命中了大缓存单元101的缓存行,则把数据写入大缓存单元101,如果大/小缓存单元都没有命中,则查询加速标识是否有效,如果有效,则将数据写入小缓存单元102,否则数据不写入闪存缓存100,直接经过集中控制设备202写入后端存储集群203。这样的写操作流程保证了,在写操作命中小缓存单元102的缓存行的情况下,那么小缓存单元102 中的数据永远是最新的。As shown in FIGS. 2 and 6, when a write operation from the virtual machine 201 is sent to the flash cache 100, if the write operation hits the cache line of the small cache unit 102, the data is written to the small cache unit 102, if If the cache line of the small cache unit 102 is not hit, but the cache line of the large cache unit 101 is hit, the data is written to the large cache unit 101. If none of the large/small cache units are hit, the query acceleration flag is valid, if valid. The data is written to the small cache unit 102, otherwise the data is not written to the flash cache 100, and is directly written to the backend storage cluster 203 via the centralized control device 202. Such a write operation flow ensures that in the case where the write operation hits the cache line of the small cache unit 102, then the small cache unit 102 The data in it is always up to date.
如图2和图7所示,当来自虚拟机201的读操作被送到闪存缓存100时,如果这个读操作命中了小缓存单元102的缓存行,则把小缓存单元102中的数据返回,如果没有命中小缓存单元102的缓存行,但是命中了大缓存单元101的缓存行,则把大缓存单元101中的数据返回,如果大/小缓存的缓存行都没有命中,则查询加速标识是否有效,如果有效,则从后端存储集群203读取大缓存行大小的数据,装载到大缓存单元101的缓存行,让后再把数据返回给虚拟机201,如果无效,则查询数据暂存标识是否有效,如果有效,则从后端存储集群203读取小缓存单元102的缓存行数据,装载到小缓存单元102的缓存行,然后再把数据返回给虚拟机201,否则从后端存储集群读取的数据不经过闪存缓存100,直接经过集中控制设备202送给前端虚拟机201。As shown in FIGS. 2 and 7, when a read operation from the virtual machine 201 is sent to the flash cache 100, if the read operation hits the cache line of the small cache unit 102, the data in the small cache unit 102 is returned. If the cache line of the small cache unit 102 is not hit, but the cache line of the large cache unit 101 is hit, the data in the large cache unit 101 is returned. If the cache line of the large/small cache is not hit, the query acceleration flag is Valid, if valid, read the data of the large cache line size from the backend storage cluster 203, load it into the cache line of the large cache unit 101, and then return the data to the virtual machine 201. If invalid, the query data is temporarily stored. Whether the identification is valid, if valid, reads the cache line data of the small cache unit 102 from the backend storage cluster 203, loads it into the cache line of the small cache unit 102, and then returns the data to the virtual machine 201, otherwise it stores from the back end. The data read by the cluster is sent to the front-end virtual machine 201 directly through the centralized control device 202 without passing through the flash cache 100.
本实施例的非易失性缓存实现方法,可将记录缓存状态的状态表的大小控制在一定范围内,除了可以对读取操作进行加速外,还能够对全部写操作进行加速。此外,备份时只对部分数据进行备份,备份数据量有限且备份操作对性能影响小。再者,还没有热备盘,能够提供不间断服务。The non-volatile cache implementation method of this embodiment can control the size of the state table of the record buffer state within a certain range, and can accelerate all the write operations in addition to accelerating the read operation. In addition, only some of the data is backed up during backup, the amount of backup data is limited and the backup operation has little impact on performance. Furthermore, there is no hot spare disk to provide uninterrupted service.
实施例二:Embodiment 2:
本实施例的装置与前述实施例中的非易失性缓存实现方法对应一致。The apparatus of this embodiment is consistent with the non-volatile cache implementation method in the foregoing embodiment.
一种非易失性缓存实现装置,包括闪存存储资源虚拟化单元,逻辑存储单元创建单元,数据写入单元和数据读取单元。A non-volatile cache implementing apparatus includes a flash storage resource virtualization unit, a logical storage unit creation unit, a data writing unit, and a data reading unit.
所述闪存存储资源虚拟化单元用于将物理的闪存存储资源虚拟化为闪存存储池。The flash storage resource virtualization unit is configured to virtualize physical flash storage resources into a flash storage pool.
所述逻辑存储单元创建单元用于在所述存储池上创建三种逻辑存储单元,大缓存单元、小缓存单元和写镜像单元,所述大缓存单元用于提供常规的缓存服务,所述小缓存单元用于提供随机写操作的加速服务和读操作的数据暂存服务,所述写镜像单元用于为大缓存和小缓存中的脏数据提供冗余备份保护功能。The logical storage unit creating unit is configured to create three logical storage units, a large cache unit, a small cache unit and a write mirror unit on the storage pool, the large cache unit is configured to provide a conventional cache service, the small cache The unit is configured to provide a data staging service for the acceleration service and the read operation of the random write operation, and the write mirror unit is configured to provide redundant backup protection function for the dirty data in the large cache and the small cache.
所述物理的闪存存储资源优选可包括两个以上物理托盘,所述大缓存单元、小缓存单元和写镜像单元均横跨所述两个以上物理托盘。且优选:所述小缓存单 元和写镜像单元的单个缓存行位于同一个物理托盘内,所述大缓存单元的单个缓存行位于同一个物理托盘内或横跨两个以上物理托盘。The physical flash storage resource preferably may include more than two physical trays, the large cache unit, the small cache unit, and the write mirror unit all spanning the two or more physical trays. And preferably: the small cache list A single cache line of meta and write mirror units are located in the same physical tray, and a single cache line of the large cache unit is located in the same physical tray or spans more than two physical trays.
所述数据写入单元进行数据写入时,如该写操作命中了小缓存单元的缓存行,则把数据写入小缓存单元,如未命中小缓存单元但命中了大缓存单元的缓存行,则把数据写入大缓存单元,如大缓存单元和小缓存单元都未命中且加速标识有效,则将数据写入小缓存单元,否则数据不写入闪存存储资源而直接写入后端存储集群。When the data writing unit performs data writing, if the write operation hits the cache line of the small cache unit, the data is written into the small cache unit, such as the cache unit that misses the small cache unit but hits the large cache unit. Then, the data is written into the large cache unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data is written into the small cache unit, otherwise the data is not written into the flash storage resource and directly written to the backend storage cluster. .
所述数据写入单元在将数据写入所述大缓存单元、小缓存单元和写镜像单元时,所述大缓存单元的写入物理位置优选与所述写镜像单元的写入物理位置处于不同的物理托盘上,所述小缓存单元的写入物理位置优选与所述写镜像单元的写入物理位置亦处于不同的物理托盘上。When the data writing unit writes data to the large cache unit, the small cache unit, and the write mirror unit, the physical location of writing of the large cache unit is preferably different from the physical location of writing of the write mirror unit. On the physical tray, the physical location of the write of the small cache unit is preferably on a different physical tray than the write physical location of the write mirror unit.
所述数据读取单元进行数据读取时,如该读操作命中了小缓存单元的缓存行,则把小缓存单元中的数据返回,如未命中小缓存单元但命中了大缓存单元的缓存行,则把大缓存单元中的数据返回,如大缓存单元和小缓存单元都未命中且加速标识有效,则从后端存储集群读取大缓存单元的缓存行对应大小的数据并装载到大缓存单元的缓存行,再把数据返回给虚拟机,如大缓存单元和小缓存单元都未命中且加速标识无效但数据暂存标识有效,则从后端存储集群读取对应小缓存单元的缓存行数据并装载到小缓存单元的缓存行,再把数据返回给虚拟机,否则从后端存储集群读取的数据不经过闪存存储资源而直接送给前端虚拟机。When the data reading unit performs data reading, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned, such as the cache line that misses the small cache unit but hits the large cache unit. , the data in the large cache unit is returned, if the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the large cache. The cache line of the unit returns the data to the virtual machine. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary identifier is valid, the cache line corresponding to the small cache unit is read from the backend storage cluster. The data is loaded into the cache line of the small cache unit, and the data is returned to the virtual machine. Otherwise, the data read from the backend storage cluster is directly sent to the front-end virtual machine without passing through the flash storage resource.
所述大缓存单元、小缓存单元和写镜像单元的大小划分可由多种方式,优选采用满足如下公式The size division of the large cache unit, the small cache unit, and the write mirror unit may be performed in various manners, preferably using the following formula
(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size,其中,(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=available_DRAM_Size/entry_size, where
Big_Size为大缓存单元的大小,Big_Size is the size of the large cache unit,
Little_Size为小缓存单元的大小,Little_Size is the size of a small cache unit.
Mirror_size为写镜像单元的大小,Mirror_size is the size of the write mirror unit.
Little_granularity为小缓存单元缓存行的大小, Little_granularity is the size of the cache line for small cache units.
Big_granularity为大缓存单元缓存行的大小,Big_granularity is the size of the cache line for large cache units.
available_DRAM_Size是可用的存储缓存状态表的DRAM的大小,available_DRAM_Size is the size of the DRAM available for storing the cache status table,
entry_size是缓存每个表项的大小。Entry_size is the size of each entry cached.
此外,所述写镜像单元可由多个逻辑写镜像子单元构成。Furthermore, the write mirror unit may be composed of a plurality of logical write mirror subunits.
所述数据写入单元和数据读取单元的操作落到哪个物理托盘优选按以下原则:当某个物理托盘损坏时,仅仅将原来映射到该物理托盘上的操作转移到其他的物理托盘上,而原来就映射到其他物理托盘上的读写操作维持映射关系不变。Which physical tray the operation of the data writing unit and the data reading unit falls to is preferably based on the principle that when a physical tray is damaged, only the operation originally mapped to the physical tray is transferred to another physical tray, The read and write operations mapped to other physical trays remain unchanged.
所述大缓存单元的缓存行至少包括脏状态、干净状态和无效状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中无有效数据;当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数据装载请求时跳转到干净状态;当缓存处于脏状态时,只有收到缓存行清除请求时跳转为干净状态;当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到失效请求时跳转到无效状态。The cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, the dirty state indicating that data in the cache line is inconsistent with data in the backend storage system, the clean state indicating data in the cache line and The data in the backend storage system is consistent. The invalid state indicates that there is no valid data in the cache line; when the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps when receiving the clean data load request. Go to the clean state; when the cache is dirty, it will jump to the clean state only when the cache line clear request is received; when the cache line is in the clean state, it will jump to the dirty state when receiving the data write request, and the invalidation is received. Jumps to an invalid state on request.
所述小缓存单元的缓存行至少包括脏状态、干净状态、无效状态和冻结状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中没有有效数据,所述冻结状态表示当前缓存行处于冻结状态,只能被读取,不能被写入;当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数据装载请求时跳转到干净状态;当缓存处于脏状态时,收到缓存行清除请求时跳转为无效状态,收到移动请求时跳转为冻结状态;当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到读请求时跳转到无效状态;当缓存行处于冻结状态时,只有收到移动完成的返回时缓存行跳转到无效状态。The cache line of the small cache unit includes at least a dirty state, a clean state, an invalid state, and a frozen state, where the dirty state indicates that data in the cache line is inconsistent with data in the backend storage system, and the clean state indicates that the cache line is in the cache line. The data is consistent with the data in the backend storage system. The invalid state indicates that there is no valid data in the cache line. The frozen state indicates that the current cache line is in a frozen state and can only be read and cannot be written; when the cache line In the invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request; when the cache is in the dirty state, it jumps to the invalid state when the cache line clear request is received. When the move request is received, it jumps to the frozen state; when the cache line is in the clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when the read request is received; when the cache line is frozen When the return is completed, the cache line jumps to the invalid state.
本实施例中,还优选包括守护单元,该守护单元用于在后台将写镜像单元中的脏数据清除到后端存储集群,以限制所述闪存存储资源中需做冗余备份的脏数据在预定的范围内。所述冗余备份优选采用写镜像方式。In this embodiment, it is also preferred to include a guard unit for clearing dirty data in the write mirror unit to the backend storage cluster in the background to limit dirty data in the flash storage resource that needs to be redundantly backed up. Within the predetermined range. The redundant backup preferably adopts a write mirroring manner.
本领域技术人员将认识到,对以上描述做出众多变通是可能的,所以实施例 仅是用来描述一个或多个特定实施方式。Those skilled in the art will recognize that many variations are possible in the above description, so embodiments It is only used to describe one or more specific embodiments.
尽管已经描述和叙述了被看作本发明的示范实施例,本领域技术人员将会明白,可以对其作出各种改变和替换,而不会脱离本发明的精神。另外,可以做出许多修改以将特定情况适配到本发明的教义,而不会脱离在此描述的本发明中心概念。所以,本发明不受限于在此披露的特定实施例,但本发明可能还包括属于本发明范围的所有实施例及其等同物。 While the invention has been described and described with reference to the embodiments of the embodiments In addition, many modifications may be made to adapt a particular situation to the teachings of the invention, without departing from the inventive concept. Therefore, the invention is not limited to the specific embodiments disclosed herein, but the invention may also include all embodiments and equivalents thereof.

Claims (24)

  1. 一种非易失性缓存实现方法,其特征在于:首先将物理的闪存存储资源虚拟化为闪存存储池,然后在所述存储池上创建三种逻辑存储单元,大缓存单元、小缓存单元和写镜像单元,所述大缓存单元用于提供常规的缓存服务,所述小缓存单元用于提供随机写操作的加速服务和读操作的数据暂存服务,所述写镜像单元用于为大缓存单元和小缓存单元中的脏数据提供冗余备份保护功能;A non-volatile cache implementation method is characterized in that a physical flash storage resource is first virtualized into a flash storage pool, and then three logical storage units, a large cache unit, a small cache unit, and a write are created on the storage pool. a mirroring unit, the large cache unit is configured to provide a conventional cache service, the small cache unit is configured to provide an acceleration service for a random write operation and a data temporary storage service for a read operation, and the write mirror unit is configured to be a large cache unit And the dirty data in the small cache unit provides redundant backup protection;
    数据写入时,如该写操作命中了小缓存单元的缓存行,则把数据写入小缓存单元,如未命中小缓存单元但命中了大缓存单元的缓存行,则把数据写入大缓存单元,如大缓存单元和小缓存单元都未命中且加速标识有效,则将数据写入小缓存单元,否则数据不写入闪存存储资源而直接写入后端存储集群;When data is written, if the write operation hits the cache line of the small cache unit, the data is written to the small cache unit. If the cache unit is missed but the cache line of the large cache unit is hit, the data is written to the cache. Units, such as large cache units and small cache units are missed and the acceleration identifier is valid, then the data is written to the small cache unit, otherwise the data is not written to the flash storage resource and directly written to the backend storage cluster;
    数据读取时,如该读操作命中了小缓存单元的缓存行,则把小缓存单元中的数据返回,如未命中小缓存单元但命中了大缓存单元的缓存行,则把大缓存单元中的数据返回,如大缓存单元和小缓存单元都未命中且加速标识有效,则从后端存储集群读取大缓存单元的缓存行对应大小的数据并装载到大缓存单元的缓存行,再把数据返回给前端数据申请单元,如大缓存单元和小缓存单元都未命中且加速标识无效但数据暂存标识有效,则从后端存储集群读取对应小缓存单元的缓存行数据并装载到小缓存单元的缓存行,再把数据返回给前端数据申请单元,否则从后端存储集群读取的数据不经过闪存存储资源而直接送给前端数据申请单元。When the data is read, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned. If the cache unit is missed but the cache line of the large cache unit is hit, the large cache unit is The data is returned. If the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the cache line of the large cache unit, and then The data is returned to the front-end data requesting unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary storage identifier is valid, the cache line data corresponding to the small cache unit is read from the backend storage cluster and loaded into the small The cache line of the cache unit returns the data to the front-end data request unit. Otherwise, the data read from the back-end storage cluster is directly sent to the front-end data request unit without passing through the flash storage resource.
  2. 如权利要求1所述的非易失性缓存实现方法,其特征在于:所述大缓存单元、小缓存单元和写镜像单元的大小满足如下公式The non-volatile cache implementation method according to claim 1, wherein the size of the large cache unit, the small cache unit, and the write mirror unit satisfy the following formula
    (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=
    available_DRAM_Size/entry_size,其中,available_DRAM_Size/entry_size, where
    Big_Size为大缓存单元的大小,Big_Size is the size of the large cache unit,
    Little_Size为小缓存单元的大小,Little_Size is the size of a small cache unit.
    Mirror_size为写镜像单元的大小,Mirror_size is the size of the write mirror unit.
    Little_granularity为小缓存单元缓存行的大小,Little_granularity is the size of the cache line for small cache units.
    Big_granularity为大缓存单元缓存行的大小,Big_granularity is the size of the cache line for large cache units.
    available_DRAM_Size是可用的存储缓存状态表的DRAM的大小, available_DRAM_Size is the size of the DRAM available for storing the cache status table,
    entry_size是缓存每个表项的大小。Entry_size is the size of each entry cached.
  3. 如权利要求1所述的非易失性缓存实现方法,其特征在于:所述写镜像单元由至少一个逻辑写镜像子单元组成,所述大缓存单元、小缓存单元分别由至少一个逻辑大缓存子单元、至少一个逻辑小缓存子单元组成。The non-volatile cache implementation method according to claim 1, wherein the write mirror unit is composed of at least one logical write mirror sub-unit, and the large cache unit and the small cache unit are respectively configured by at least one logical large cache. The subunit is composed of at least one logical small buffer subunit.
  4. 如权利要求1所述的非易失性缓存实现方法,其特征在于:所述物理的闪存存储资源包括两个以上物理托盘,所述大缓存单元、小缓存单元和写镜像单元均横跨所述两个以上物理托盘。The non-volatile cache implementation method according to claim 1, wherein the physical flash storage resource comprises more than two physical trays, and the large cache unit, the small cache unit and the write mirror unit are both spanned. More than two physical trays.
  5. 如权利要求4所述的非易失性缓存实现方法,其特征在于:数据在写入所述大缓存单元、小缓存单元和写镜像单元时,所述大缓存单元的写入物理位置与所述写镜像单元的写入物理位置处于不同的所述物理托盘上,所述小缓存单元的写入物理位置与所述写镜像单元的写入物理位置亦处于不同的所述物理托盘上。The non-volatile cache implementation method according to claim 4, wherein when the data is written into the large cache unit, the small cache unit, and the write mirror unit, the physical location of the large cache unit is written The write physical location of the write mirror unit is on the different physical tray, and the write physical location of the small cache unit and the write physical location of the write mirror unit are also on the physical tray.
  6. 如权利要求4所述的非易失性缓存实现方法,其特征在于:所述小缓存单元和写镜像单元的单个缓存行位于同一个物理托盘内或横跨两个以上物理托盘,所述大缓存单元的单个缓存行位于同一个物理托盘内或横跨两个以上物理托盘。The non-volatile cache implementation method according to claim 4, wherein the small cache unit and the single cache line of the write mirror unit are located in the same physical tray or span more than two physical trays, the large A single cache line of a cache unit is located in the same physical tray or spans more than two physical trays.
  7. 如权利要求4所述的非易失性缓存实现方法,其特征在于,数据写入操作和数据读取操作落到哪个物理托盘按以下原则:当某个物理托盘损坏时,仅仅将原来映射到该物理托盘上的操作转移到其他的物理托盘上,而原来就映射到其他物理托盘上的读写操作维持映射关系不变。The non-volatile cache implementing method according to claim 4, wherein the physical writing tray and the data reading operation fall to which physical tray according to the following principle: when a physical tray is damaged, only the original mapping is performed to The operations on the physical tray are transferred to other physical trays, while the read and write operations originally mapped to other physical trays maintain the mapping relationship.
  8. 如权利要求1所述的非易失性缓存实现方法,其特征在于:所述大缓存单元的缓存行至少包括脏状态、干净状态和无效状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中无有效数据;The non-volatile cache implementation method according to claim 1, wherein the cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, and the dirty state represents data in the cache line and The data in the end storage system is inconsistent, and the clean state indicates that the data in the cache line is consistent with the data in the backend storage system, and the invalid state indicates that there is no valid data in the cache line;
    当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数据装载请求时跳转到干净状态;When the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request;
    当缓存行处于脏状态时,只有收到缓存行清除请求时跳转为干净状态;When the cache line is in a dirty state, it jumps to a clean state only when a cache line clear request is received;
    当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到失效请求时跳转到无效状态。 When the cache line is in a clean state, it jumps to the dirty state when it receives the data write request, and jumps to the invalid state when it receives the invalidation request.
  9. 如权利要求1所述的非易失性缓存实现方法,其特征在于:所述小缓存单元的缓存行至少包括脏状态、干净状态、无效状态和冻结状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中没有有效数据,所述冻结状态表示当前缓存行处于冻结状态,只能被读取,不能被写入;The non-volatile cache implementation method according to claim 1, wherein the cache line of the small cache unit comprises at least a dirty state, a clean state, an invalid state, and a frozen state, wherein the dirty state represents a cache line. The data is inconsistent with data in the backend storage system, the clean state indicating that the data in the cache line is consistent with the data in the backend storage system, the invalid state indicating that there is no valid data in the cache line, the frozen state indicating the current cache The line is in a frozen state and can only be read and cannot be written.
    当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数据装载请求时跳转到干净状态;When the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request;
    当缓存行处于脏状态时,收到缓存行清除请求时跳转为无效状态,收到移动请求时跳转为冻结状态;When the cache line is in a dirty state, the jump to the invalid state when the cache line clear request is received, and the jump to the frozen state when the move request is received;
    当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到读请求时跳转到无效状态;When the cache line is in a clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when receiving the read request;
    当缓存行处于冻结状态时,只有收到移动完成的返回时缓存行跳转到无效状态。When the cache line is frozen, the cache line jumps to the invalid state only when the return is completed.
  10. 如权利要求1所述的非易失性缓存实现方法,其特征在于:还包括守护单元,该守护单元用于在后台将写镜像单元中的脏数据清除到后端存储集群,以限制所述闪存存储资源中需做冗余备份的脏数据在预定的范围内。The non-volatile cache implementation method according to claim 1, further comprising a guard unit, configured to clear dirty data in the write mirror unit to the backend storage cluster in the background to limit the The dirty data that needs to be redundantly backed up in the flash storage resource is within a predetermined range.
  11. 如权利要求10所述的非易失性缓存实现方法,其特征在于:所述冗余备份采用写镜像方式。The non-volatile cache implementation method according to claim 10, wherein the redundant backup adopts a write mirroring manner.
  12. 如权利要求1-11任一项所述的非易失性缓存实现方法,其特征在于:所述物理的闪存存储资源是闪存或相位存储器[H1]。The non-volatile cache implementation method according to any one of claims 1 to 11, wherein the physical flash storage resource is a flash memory or a phase memory [H1].
  13. 一种非易失性缓存实现装置,其特征在于包括:A non-volatile cache implementation device, comprising:
    闪存存储资源虚拟化单元,用于将物理的闪存存储资源虚拟化为闪存存储池;a flash storage resource virtualization unit for virtualizing physical flash storage resources into a flash storage pool;
    逻辑存储单元创建单元,用于在所述存储池上创建三种逻辑存储单元,大缓存单元、小缓存单元和写镜像单元,所述大缓存单元用于提供常规的缓存服务,所述小缓存单元用于提供随机写操作的加速服务和读操作的数据暂存服务,所述写镜像单元用于为大缓存和小缓存中的脏数据提供冗余备份保护功能;a logical storage unit creating unit, configured to create three logical storage units, a large cache unit, a small cache unit and a write mirror unit on the storage pool, wherein the large cache unit is configured to provide a conventional cache service, and the small cache unit a data temporary storage service for providing an accelerated service and a read operation for a random write operation, the write mirroring unit for providing redundant backup protection function for dirty data in a large cache and a small cache;
    数据写入单元和数据读取单元;a data writing unit and a data reading unit;
    所述数据写入单元进行数据写入时,如该写操作命中了小缓存单元的缓存 行,则把数据写入小缓存单元,如未命中小缓存单元但命中了大缓存单元的缓存行,则把数据写入大缓存单元,如大缓存单元和小缓存单元都未命中且加速标识有效,则将数据写入小缓存单元,否则数据不写入闪存存储资源而直接写入后端存储集群;When the data writing unit performs data writing, if the write operation hits the cache of the small cache unit Line, the data is written to the small cache unit. If the cache unit is missed but the cache line of the large cache unit is hit, the data is written to the large cache unit, such as the large cache unit and the small cache unit are missed and the identifier is accelerated. If it is valid, the data is written to the small cache unit, otherwise the data is not written into the flash storage resource and directly written to the backend storage cluster;
    所述数据读取单元进行数据读取时,如该读操作命中了小缓存单元的缓存行,则把小缓存单元中的数据返回,如未命中小缓存单元但是命中了大缓存单元的缓存行,则把大缓存单元中的数据返回,如大缓存单元和小缓存单元都未命中且加速标识有效,则从后端存储集群读取大缓存单元的缓存行对应大小的数据并装载到大缓存单元的缓存行,再把数据返回给前端数据申请单元,如大缓存单元和小缓存单元都未命中且加速标识无效但数据暂存标识有效,则从后端存储集群读取对应小缓存单元的缓存行数据并装载到小缓存单元的缓存行,再把数据返回给前端数据申请单元,否则从后端存储集群读取的数据不经过闪存存储资源而直接送给前端前端数据申请单元。When the data reading unit performs data reading, if the read operation hits the cache line of the small cache unit, the data in the small cache unit is returned, such as missing the small cache unit but hitting the cache line of the large cache unit. , the data in the large cache unit is returned, if the large cache unit and the small cache unit are both missed and the acceleration identifier is valid, the data corresponding to the size of the cache line of the large cache unit is read from the backend storage cluster and loaded into the large cache. The cache line of the unit returns the data to the front-end data request unit. If the large cache unit and the small cache unit are both missed and the acceleration identifier is invalid but the data temporary identifier is valid, the corresponding small cache unit is read from the back-end storage cluster. Cache the row data and load it into the cache line of the small cache unit, and then return the data to the front-end data request unit. Otherwise, the data read from the back-end storage cluster is directly sent to the front-end front-end data request unit without passing through the flash storage resource.
  14. 如权利要求13所述的非易失性缓存实现装置,其特征在于:所述大缓存单元、小缓存单元和写镜像单元的大小满足如下公式The non-volatile cache implementing apparatus according to claim 13, wherein the size of the large cache unit, the small cache unit, and the write mirror unit satisfy the following formula
    (Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=(Little_Size+Mirror_size)/Little_granularity+Big_Size/Big_granularity<=
    available_DRAM_Size/entry_size,其中,available_DRAM_Size/entry_size, where
    Big_Size为大缓存单元的大小,Big_Size is the size of the large cache unit,
    Little_Size为小缓存单元的大小,Little_Size is the size of a small cache unit.
    Mirror_size为写镜像单元的大小,Mirror_size is the size of the write mirror unit.
    Little_granularity为小缓存单元缓存行的大小,Little_granularity is the size of the cache line for small cache units.
    Big_granularity为大缓存单元缓存行的大小,Big_granularity is the size of the cache line for large cache units.
    available_DRAM_Size是可用的存储缓存状态表的DRAM的大小,available_DRAM_Size is the size of the DRAM available for storing the cache status table,
    entry_size是缓存每个表项的大小。Entry_size is the size of each entry cached.
  15. 如权利要求13所述的非易失性缓存实现装置,其特征在于:所述写镜像单元由至少一个逻辑写镜像子单元构成,所述大缓存单元、小缓存单元可由一个或者多个逻辑大缓存子单元、逻辑小缓存子单元组成。The non-volatile cache implementing apparatus according to claim 13, wherein the write mirror unit is composed of at least one logical write mirror sub-unit, and the large cache unit and the small cache unit may be one or more logically large. The cache subunit and the logical small cache subunit are composed.
  16. 如权利要求13所述的非易失性缓存实现装置,其特征在于:所述物理的闪存存储资源包括两个以上物理托盘,所述大缓存单元、小缓存单元和写镜像 单元均横跨所述两个以上物理托盘。The non-volatile cache implementing apparatus according to claim 13, wherein said physical flash storage resource comprises more than two physical trays, said large cache unit, small cache unit and write mirror The units all span the two or more physical trays.
  17. 如权利要求16所述的非易失性缓存实现装置,其特征在于:所述数据写入单元在将数据写入所述大缓存单元、小缓存单元和写镜像单元时,所述大缓存单元的写入物理位置与所述写镜像单元的写入物理位置处于不同的物理托盘上,所述小缓存单元的写入物理位置与所述写镜像单元的写入物理位置亦处于不同的物理托盘上。The non-volatile cache implementing apparatus according to claim 16, wherein said data writing unit writes data to said large cache unit, said small cache unit and said write mirror unit, said large cache unit The write physical location is on a different physical tray than the write physical location of the write mirror unit, and the write physical location of the small cache unit and the write physical location of the write mirror unit are also in different physical trays on.
  18. 如权利要求16所述的非易失性缓存实现装置,其特征在于:所述小缓存单元和写镜像单元的单个缓存行位于同一个物理托盘内或横跨两个以上物理托盘,所述大缓存单元的单个缓存行位于同一个物理托盘内或横跨两个以上物理托盘。The non-volatile cache implementing apparatus according to claim 16, wherein the small cache unit and the single cache line of the write mirror unit are located in the same physical tray or span more than two physical trays, the large A single cache line of a cache unit is located in the same physical tray or spans more than two physical trays.
  19. 如权利要求16所述的非易失性缓存实现装置,其特征在于,所述数据写入单元和数据读取单元的操作落到哪个物理托盘按以下原则:当某个物理托盘损坏时,仅仅将原来映射到该物理托盘上的操作转移到其他的物理托盘上,而原来就映射到其他物理托盘上的读写操作维持映射关系不变。The non-volatile cache implementing apparatus according to claim 16, wherein the physical tray to which the operation of the data writing unit and the data reading unit falls is based on the following principle: when a physical tray is damaged, only The operations originally mapped to the physical tray are transferred to other physical trays, and the read and write operations originally mapped to other physical trays remain unchanged.
  20. 如权利要求13所述的非易失性缓存实现装置,其特征在于:所述大缓存单元的缓存行至少包括脏状态、干净状态和无效状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中无有效数据;The non-volatile cache implementing apparatus according to claim 13, wherein the cache line of the large cache unit includes at least a dirty state, a clean state, and an invalid state, and the dirty state represents data in the cache line and The data in the end storage system is inconsistent, and the clean state indicates that the data in the cache line is consistent with the data in the backend storage system, and the invalid state indicates that there is no valid data in the cache line;
    当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数据装载请求时跳转到干净状态;When the cache line is in an invalid state, it jumps to the dirty state when receiving the data write request, and jumps to the clean state when receiving the clean data load request;
    当缓存行处于脏状态时,只有收到缓存行清除请求时跳转为干净状态;When the cache line is in a dirty state, it jumps to a clean state only when a cache line clear request is received;
    当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到失效请求时跳转到无效状态。When the cache line is in a clean state, it jumps to the dirty state when it receives the data write request, and jumps to the invalid state when it receives the invalidation request.
  21. 如权利要求13所述的非易失性缓存实现装置,其特征在于:所述小缓存单元的缓存行至少包括脏状态、干净状态、无效状态和冻结状态,所述脏状态表示缓存行中的数据和后端存储系统中的数据不一致,所述干净状态表示缓存行中的数据和后端存储系统中的数据一致,所述无效状态表示缓存行中没有有效数据,所述冻结状态表示当前缓存行处于冻结状态,只能被读取,不能被写入;The non-volatile cache implementing apparatus according to claim 13, wherein the cache line of the small cache unit comprises at least a dirty state, a clean state, an invalid state, and a frozen state, wherein the dirty state represents a cache line. The data is inconsistent with data in the backend storage system, the clean state indicating that the data in the cache line is consistent with the data in the backend storage system, the invalid state indicating that there is no valid data in the cache line, the frozen state indicating the current cache The line is in a frozen state and can only be read and cannot be written.
    当缓存行处于无效状态时,收到数据写入请求时跳转到脏状态,收到干净数 据装载请求时跳转到干净状态;When the cache line is in an invalid state, it will jump to the dirty state when receiving the data write request, and receive a clean number. Jump to a clean state according to the load request;
    当缓存行处于脏状态时,收到缓存行清除请求时跳转为无效状态,收到移动请求时跳转为冻结状态;When the cache line is in a dirty state, the jump to the invalid state when the cache line clear request is received, and the jump to the frozen state when the move request is received;
    当缓存行处于干净状态时,收到数据写入请求时跳转到脏状态,收到读请求时跳转到无效状态;When the cache line is in a clean state, it jumps to the dirty state when receiving the data write request, and jumps to the invalid state when receiving the read request;
    当缓存行处于冻结状态时,只有收到移动完成的返回时缓存行跳转到无效状态。When the cache line is frozen, the cache line jumps to the invalid state only when the return is completed.
  22. 如权利要求13所述的非易失性缓存实现装置,其特征在于:还包括守护单元,该守护单元用于在后台将写镜像单元中的脏数据清除到后端存储集群,以限制所述闪存存储资源中需做冗余备份的脏数据在预定的范围内。The non-volatile cache implementing apparatus according to claim 13, further comprising: a guard unit, configured to clear dirty data in the write mirror unit to the backend storage cluster in the background to limit the The dirty data that needs to be redundantly backed up in the flash storage resource is within a predetermined range.
  23. 如权利要求22所述的非易失性缓存实现装置,其特征在于:所述冗余备份采用写镜像方式。The non-volatile cache implementing apparatus according to claim 22, wherein said redundant backup adopts a write mirroring mode.
  24. 如权利要求13-23任一项所述的非易失性缓存实现装置,其特征在于:所述物理的闪存存储资源是闪存或相位存储器。 The non-volatile cache implementing apparatus according to any one of claims 13 to 23, wherein the physical flash storage resource is a flash memory or a phase memory.
PCT/CN2014/094448 2014-12-19 2014-12-19 Method and apparatus for realizing non-volatile cache WO2016095233A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/094448 WO2016095233A1 (en) 2014-12-19 2014-12-19 Method and apparatus for realizing non-volatile cache

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/094448 WO2016095233A1 (en) 2014-12-19 2014-12-19 Method and apparatus for realizing non-volatile cache

Publications (1)

Publication Number Publication Date
WO2016095233A1 true WO2016095233A1 (en) 2016-06-23

Family

ID=56125693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/094448 WO2016095233A1 (en) 2014-12-19 2014-12-19 Method and apparatus for realizing non-volatile cache

Country Status (1)

Country Link
WO (1) WO2016095233A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170344575A1 (en) * 2016-05-27 2017-11-30 Netapp, Inc. Methods for facilitating external cache in a cloud storage environment and devices thereof
CN113946275A (en) * 2020-07-15 2022-01-18 中移(苏州)软件技术有限公司 Cache management method and device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6510493B1 (en) * 1999-07-15 2003-01-21 International Business Machines Corporation Method and apparatus for managing cache line replacement within a computer system
CN101118519A (en) * 2007-09-10 2008-02-06 杭州华三通信技术有限公司 Method and apparatus for protecting caching content and caching controller thereof
CN101387987A (en) * 2007-09-12 2009-03-18 索尼株式会社 Storage device, method and program for controlling storage device
CN101446921A (en) * 2008-12-23 2009-06-03 青岛海信宽带多媒体技术股份有限公司 Dynamic storage method of Flash memory
CN102968389A (en) * 2012-10-30 2013-03-13 记忆科技(深圳)有限公司 Storage device and storage method based on multi-level flash memory cell
CN104484287A (en) * 2014-12-19 2015-04-01 北京麓柏科技有限公司 Nonvolatile cache realization method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6510493B1 (en) * 1999-07-15 2003-01-21 International Business Machines Corporation Method and apparatus for managing cache line replacement within a computer system
CN101118519A (en) * 2007-09-10 2008-02-06 杭州华三通信技术有限公司 Method and apparatus for protecting caching content and caching controller thereof
CN101387987A (en) * 2007-09-12 2009-03-18 索尼株式会社 Storage device, method and program for controlling storage device
CN101446921A (en) * 2008-12-23 2009-06-03 青岛海信宽带多媒体技术股份有限公司 Dynamic storage method of Flash memory
CN102968389A (en) * 2012-10-30 2013-03-13 记忆科技(深圳)有限公司 Storage device and storage method based on multi-level flash memory cell
CN104484287A (en) * 2014-12-19 2015-04-01 北京麓柏科技有限公司 Nonvolatile cache realization method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170344575A1 (en) * 2016-05-27 2017-11-30 Netapp, Inc. Methods for facilitating external cache in a cloud storage environment and devices thereof
CN113946275A (en) * 2020-07-15 2022-01-18 中移(苏州)软件技术有限公司 Cache management method and device and storage medium
CN113946275B (en) * 2020-07-15 2024-04-09 中移(苏州)软件技术有限公司 Cache management method and device and storage medium

Similar Documents

Publication Publication Date Title
US10853268B2 (en) Parity generating information processing system
CN106445405B (en) Data access method and device for flash memory storage
US10009438B2 (en) Transaction log acceleration
US9218278B2 (en) Auto-commit memory
CN104350477B (en) For solid-state drive device(SSD)Optimization context remove
US9767017B2 (en) Memory device with volatile and non-volatile media
Lee et al. Unioning of the buffer cache and journaling layers with non-volatile memory
US20130042056A1 (en) Cache Management Including Solid State Device Virtualization
US20180107601A1 (en) Cache architecture and algorithms for hybrid object storage devices
US20100235568A1 (en) Storage device using non-volatile memory
US8862819B2 (en) Log structure array
CN104484287B (en) Nonvolatile cache realization method and device
CN109739696B (en) Double-control storage array solid state disk caching acceleration method
CN106469123A (en) A kind of write buffer distribution based on NVDIMM, method for releasing and its device
CN106469119B (en) Data writing caching method and device based on NVDIMM
US9785552B2 (en) Computer system including virtual memory or cache
US10031689B2 (en) Stream management for storage devices
US9645926B2 (en) Storage system and method for managing file cache and block cache based on access type
US9032153B2 (en) Use of flash cache to improve tiered migration performance
WO2016095233A1 (en) Method and apparatus for realizing non-volatile cache
CN110647476B (en) Method, device and equipment for writing data in solid state disk and storage medium
US11836092B2 (en) Non-volatile storage controller with partial logical-to-physical (L2P) address translation table
US10140029B2 (en) Method and apparatus for adaptively managing data in a memory based file system
US20170052899A1 (en) Buffer cache device method for managing the same and applying system thereof
US10452306B1 (en) Method and apparatus for asymmetric raid

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14908268

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.12.2017)

122 Ep: pct application non-entry in european phase

Ref document number: 14908268

Country of ref document: EP

Kind code of ref document: A1