US20140122433A1 - Storage device and data backup method - Google Patents

Storage device and data backup method Download PDF

Info

Publication number
US20140122433A1
US20140122433A1 US14/015,550 US201314015550A US2014122433A1 US 20140122433 A1 US20140122433 A1 US 20140122433A1 US 201314015550 A US201314015550 A US 201314015550A US 2014122433 A1 US2014122433 A1 US 2014122433A1
Authority
US
United States
Prior art keywords
data
backup
distributed database
storage device
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/015,550
Inventor
Akifumi Murata
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2012239488A external-priority patent/JP5342055B1/en
Application filed by Toshiba Corp filed Critical Toshiba Corp
Assigned to KABUSHIKI KAISHA TOSHIBA reassignment KABUSHIKI KAISHA TOSHIBA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MURATA, Akifumi
Publication of US20140122433A1 publication Critical patent/US20140122433A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/30289
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Definitions

  • Embodiments described herein relate generally to a data backup technique suitable for, e.g., a distributed database.
  • a distributed database is a storage system in which data is distributed to a plurality of nodes to improve the parallelism, as a result of which the function of writing/reading data is also improved.
  • a host machine which gives a request for writing/reading data to/from a distributed database does not recognize nodes forming the distributed database.
  • a machine which gives a request for writing/reading data to/from the distributed database is referred to as a host machine; that is, the host machine is not a machine which manages the distributed database.
  • a given host machine backs up data in the distributed database.
  • the host machine gives a large number of requests for data reading to the distributed database, and nodes in the distributed database execute data reading processing all at once. Consequently, a data channel between the host machine and the distributed database, i.e., between the host machine and a plurality of nodes (connected in parallel), becomes bottleneck. This restricts the performance of the distributed database at the time of backing up data.
  • FIG. 1 is an exemplary view for showing an example of a construction of a distribute database to which storage devices according to an embodiment are applied as node.
  • FIG. 2 is an exemplary view for explaining a basic principle of data backup processing in the distributed database to which the storage devices according to the embodiment are applied as nodes.
  • FIG. 3 is an exemplary view for explaining a mechanism for preventing lowering of a performance due to the backup processing, in each of the storage devices according to the embodiment.
  • FIG. 4 is an exemplary flowchart showing a procedure of the backup processing in the distributed database to which the storage devices according to the embodiment are applied as nodes.
  • FIG. 5 is an exemplary flowchart showing a procedure of data writing/reading in the distributed database to which the storage devices according to the embodiment are applied as nodes.
  • a storage device is applied to a distributed database.
  • the storage device includes a communication module and a backup module.
  • the a backup module is configured to make backup of partitioning information and data stored for the distributed database in the storage device, when the communication module receives a command to make a backup of the distributed data base.
  • the partitioning information indicates locations of partitions created by separating a storage area of the distributed database.
  • FIG. 1 is an exemplary view showing an example of a construction of a distributed database 1 to which storages devices according to an embodiment are applied as nodes 10 .
  • the distributed database 1 includes a plurality of nodes 10 connected to a data channel A.
  • various methods can be adopted as a method for constructing the distributed database 1 .
  • one of the plurality of nodes 10 is applied as a master, and controls the entire distributed database 1 ;
  • the plurality of nodes 10 equally and independently operate according to a predetermined rule as elements of the distributed database;
  • a host node which controls the entire distributed database 1 is provided separate from the plurality of nodes 10 .
  • a data backup system which will be described later is, however, not limited to any of the above methods.
  • a host machine makes a request for reading data from the distributed database 1 thereto.
  • the request made by the host machine is received by the node 10 serving as the master, it is determined which of the nodes 10 holds the above requested data, and (if the node 10 serving as the master does not hold the data) the request is sent to the node 10 determined to hold the data.
  • each of the nodes 10 receives the request made by the host machine, said each node 10 determines whether data held by said each node 10 corresponds to the requested data or not, and of all the nodes 10 , the node 10 which determines that the data which it holds corresponds to the requested data reads the data.
  • the request made by the host machine is received by the host node, it is determined which of the nodes 10 holds the requested data, and the request is sent to the node 10 determined to hold the data.
  • the nodes 10 each include a communication and I/O controller 11 , a storage device 12 and a cache memory 13 .
  • the communication and I/O controller 11 of each node 10 is a device for controlling said each node 10 , and includes a function of executing communication with the other nodes 10 as a first function.
  • Each node 10 holds status information and partitioning information on the entire distributed database 1 , and synchronizes those information in the distributed database 1 with a communication function of its communication and I/O controller 11 .
  • the partitioning information is information indicating in which of the nodes 10 each of divided areas (partitions) of a storage area of the entire distributed database 1 is present.
  • the communication and I/O controller 11 includes a function of controlling inputting and outputting of data to and from the storage device 12 and the cache memory 13 as a second function.
  • the communication and I/O controller 11 executes writing/reading of data to/from the storage device 12 , while using the cache memory 13 as a cache.
  • the communication and I/O controller 11 can execute both write-back processing in which it gives a reply indicating that writing is completed, at the point in time when data is written to the cache memory 13 and write-through processing in which it gives a reply indicating that writing is completed, at the point in time when data is written to the storage device 12 .
  • the communication and I/O controller 11 includes a specific mode in which it executes data writing/reading on the storage device 12 without changing the contents of data in the cache memory 13 .
  • a plurality of methods can be adopted. For example, in a method (a), the cache memory 13 is not used, and in a method (b), in the case where data to be read is present in the cache memory 13 , the cache memory 13 is used only when the data is read from the cache memory 13 .
  • storage areas 100 are storage areas of the nodes 10 , each of which includes the storage device 12 and the cache memory 13 .
  • a data area 101 is allocated for the distributed database 1 , and holds the partitioning information 102 and status information 103 .
  • the communication and I/O controller 11 accesses the data area 101 in response to a request for writing/reading data (I/O request), which is issued by the host machine, based on the partitioning information 102 .
  • a given host machine issues a request (backup request) for backing up data in the distributed database 1 .
  • a request for backing up data in the distributed database 1 .
  • the above backup request is sent to the other nodes 10 in the distributed database 1 by the communication function of the communication and I/O controller 11 .
  • each of the nodes 10 When each of the nodes 10 receives the backup request, the communication and I/O controller 11 of said each node 10 updates the status information 103 to change the status indicated thereby from an operation status to a backing-up status.
  • the distributed database 1 receives a request for data writing/reading (from, e.g., a host machine other than the above host machine issuing the backup request)
  • the distributed database 1 immediately executes data reading
  • the distributed database 1 execute data writing after the backup processing is completed.
  • the communication and I/O controller 11 When the status information 103 is updated the backing-up status, the communication and I/O controller 11 creates a backup (indicated by reference numeral 111 in FIG. 2 ) of data in the data area 101 in the storage area 100 of the node 10 based on the partitioning information 102 . Also, the communication and I/O controller 11 makes a backup (indicated by reference numeral 112 in FIG. 2 ) of the partitioning information 102 in the storage area 100 of the node 10 . When producing the backup of the data of the data area 101 , the communication and I/O controller 11 may perform data compression. The backup ( 111 ) of the data in the data area 101 and the backup ( 112 ) of the partitioning information 102 are stored in a backup file in the node 10 .
  • the communication and I/O controller 11 of each node 10 includes a function of communicating with the other nodes 10 .
  • the communication and I/O controller 11 adds status information indicating that the backup processing is completed to a backup file. Then, after confirming that the status information indicating that the data in all the nodes 10 in the distributed database 1 is completely backed up is added to the backup file, the communication and I/O controller 11 updates the status information 103 to change the status indicated thereby from the backing-up status to the operation status.
  • the status information indicating the completion of the backup processing is added, it can be determined that the backup file is an available file.
  • each of the nodes 10 makes backups of the partitioning information 102 and data in the data area 101 for the distributed database 1 in the storage area 100 in said each node 10 , as a result of which a backup of the data in the entire distributed database is made. Therefore, it is not necessary to transmit a backup using the data channel A, and thus the data channel A does not become bottleneck. Accordingly, the data in the distributed database 1 can be backed up at a higher speed.
  • the communication and I/O controller 11 includes a function of restoring, with the backup, the data of the node 10 to that of the node 10 at the time of backing up the data of the node 10 .
  • a given host machine issues a request for restoring the data of the distributed database 1 with the backup.
  • the above request is sent to the other nodes 10 by the communication function of the communication and I/O controller 11 .
  • the communication and I/O controller 11 of said each node 10 updates the status information 103 to change the status indicated thereby from the operation status to a maintenance state.
  • the entire distributed database 1 checks integrity of the backup file (to be managed by said each node 10 ). To be more specific, it is checked, e.g., whether status information indicating completion of backup processing is added to the backup file of said each node 10 or not, and whether all the nodes 10 indicated in the partitioning information 102 are present or not.
  • the communication and I/O controller 11 of each node 10 updates the status indicated by the status information 103 to a restoring status, and starts to read data from the backup file. Then, when data is completely read from the backup files in all the nodes 10 in the distributed database 1 , i.e., data restoring processing is completed, the communication and I/O controller 11 of each node 10 updates the status information 103 from the status indicated thereby to the operation status, and the node 10 serving as the master starts to accept a request for accessing it for data, which is issued by the host machine.
  • the backups made by the nodes 10 are set as temporary backups, and successively read out, and then stored as accepted backups by, e.g., a magnetic tape.
  • the restoring function of the communication and I/O controller 11 can also restore data of the node 10 with a backup externally input, such as backup data held by a magnetic tape.
  • processing for writing the data may be executed in response to the request at the point in time when the data in all the nodes 10 in the distributed database 1 is completely backed up. Also, it may be set that even before completion of the backup processing of the data in all the nodes 10 in the distributed database 1 , any of the nodes 10 , whose data is completely backed up, is subjected to the data writing processing.
  • the communication and I/O controller 11 manages the cache memory 13 to replace data associated with the oldest one of requests with data associated with the newest request (cache out).
  • the cache memory 13 is full of a large number of data read/written to make the backup.
  • the data held in the cache memory 13 before the backup processing is almost lost, and thus the performance of the distributed database 1 temporarily lowers just after the backup processing.
  • the storage devices of the above embodiment are made to have a mechanism for preventing the performance of the distributed database 1 from lowering due to backup processing.
  • FIG. 3 is an exemplary view for explaining a mechanism for preventing lowering of the performance due to the backup processing.
  • a cache area 150 is an area of the storage area 100 , which corresponds to the cache memory 13 .
  • the communication and I/O controller 11 includes a specific mode in which it executes data wring/reading on the storage device 12 without changing the contents of data in the cache memory 13 . In this specific mode, the communication and I/O controller 11 reads and writes data from and to the storage device 12 to make a backup (a 2 in FIG. 3 ).
  • the data in the cache memory 13 which is not subjected to the backup processing is maintained, and thus the performance of the distributed database 1 is prevented from temporarily lowering just after the backup processing.
  • the method of handling the cache memory 13 is available not only for the case where a plurality of nodes 10 are provided in the distributed database 1 and their data is backed up, also for the case where a single node 10 is provided and its data is backed up.
  • the above method of handling the cache memory 13 can be applied at the time of reading the backups. That is, when receiving a request for reading of a backup, the communication and I/O controller 11 executes reading of the backup in the specific mode.
  • FIG. 4 is an exemplary flowchart showing a procedure of the backup processing in the distributed database 1 to which the storage devices according to the embodiment are applied as the nodes 10 .
  • the communication and I/O controller 11 first sets the status information 103 such that it indicates “backing-up status” (block A 1 ). If the status information 103 is set to indicate “backing-up status”, the communication and I/O controller 11 makes a backup of the data in the data area 101 (in the specific mode in which the cache is not changed) (block A 2 ). At this time, the communication and I/O controller 11 also makes a backup of the partitioning information 102 (in the specific mode in which the cache is not changed (block A 3 ).
  • the communication and I/O controller 11 determines whether the data in all the nodes 10 in the distributed database 1 is completely backed up or not (block A 4 ). When determining that the data in all the nodes 10 is completely backed up (Yes in block A 4 ), the communication and I/O controller 11 sets the status information 103 such that it indicates “operation status” (block A 5 ).
  • FIG. 5 is an exemplary flowchart showing a procedure of data writing/reading in the distributed database 1 to which the storage devices according to the embodiment are applied as the nodes 10 .
  • the communication and I/O controller 11 first determines whether the status information 103 indicates “operation status” or not (block B 1 ). When determining that the status information 103 indicates “operation status” (Yes in block B 1 ), the communication and I/O controller 11 executes writing/reading of data as requested (while using the cache in a regular mode) (block B 2 ).
  • the communication and I/O controller 11 determines whether the issued request is a request for reading of data or not (block B 3 ). When determining that the request is a request for reading of data (Yes in block B 3 ), the communication and I/O controller 11 executes reading of the data as requested (while using the cache in the regular mode) (block B 4 ). On the other hand, if the above request is a request for writing of data (No in block B 3 ), the communication and I/O controller 11 determines whether the data in the node 10 is completely backed up or not (block B 5 ).
  • the communication and I/O controller 11 executes writing of the data as requested (while using the cache in the regular mode) (block B 6 ).
  • the communication and I/O controller 11 is on standby until the above data is completely backed up, and it executes wiring of the data as requested after the data in the node 10 is completely backed up.
  • the storage devices according to the embodiment enable a backup of data in the distributed database to be made at a high speed.
  • the various modules of the systems described herein can be implemented as software applications, hardware and/or software modules, or components on one or more computers, such as servers. While the various modules are illustrated separately, they may share some or all of the same underlying logic or code.

Abstract

According to one embodiment, a storage device is applied to a distributed database. The storage device includes a communication module and a backup module. The a backup module is configured to make backup of partitioning information and data stored for the distributed database in the storage device, when the communication module receives a command to make a backup of the distributed data base. The partitioning information indicates locations of partitions created by separating a storage area of the distributed database.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a Continuation Application of PCT Application No. PCT/JP2013/056682, filed Mar. 11, 2013 and based upon and claiming the benefit of priority from Japanese Patent Application No. 2012-239488, filed Oct. 30, 2012, the entire contents of all of which are incorporated herein by reference.
  • FIELD
  • Embodiments described herein relate generally to a data backup technique suitable for, e.g., a distributed database.
  • BACKGROUND
  • Various storage systems have been developed as ones for storing a large amount of data and processing writing/reading of data at a high speed. It is very important for such a kind storage system to back up data in order to secure the data.
  • A distributed database is a storage system in which data is distributed to a plurality of nodes to improve the parallelism, as a result of which the function of writing/reading data is also improved. In general, a host machine which gives a request for writing/reading data to/from a distributed database does not recognize nodes forming the distributed database. In the following explanation, a machine which gives a request for writing/reading data to/from the distributed database is referred to as a host machine; that is, the host machine is not a machine which manages the distributed database.
  • Suppose a given host machine backs up data in the distributed database. In this case, the host machine gives a large number of requests for data reading to the distributed database, and nodes in the distributed database execute data reading processing all at once. Consequently, a data channel between the host machine and the distributed database, i.e., between the host machine and a plurality of nodes (connected in parallel), becomes bottleneck. This restricts the performance of the distributed database at the time of backing up data.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • A general architecture that implements the various features of the embodiments will now be described with reference to the drawings. The drawings and the associated descriptions are provided to illustrate the embodiments and not to limit the scope of the invention.
  • FIG. 1 is an exemplary view for showing an example of a construction of a distribute database to which storage devices according to an embodiment are applied as node.
  • FIG. 2 is an exemplary view for explaining a basic principle of data backup processing in the distributed database to which the storage devices according to the embodiment are applied as nodes.
  • FIG. 3 is an exemplary view for explaining a mechanism for preventing lowering of a performance due to the backup processing, in each of the storage devices according to the embodiment.
  • FIG. 4 is an exemplary flowchart showing a procedure of the backup processing in the distributed database to which the storage devices according to the embodiment are applied as nodes.
  • FIG. 5 is an exemplary flowchart showing a procedure of data writing/reading in the distributed database to which the storage devices according to the embodiment are applied as nodes.
  • DETAILED DESCRIPTION
  • Various embodiments will be described hereinafter with reference to the accompanying drawings.
  • In general, according to one embodiment, a storage device is applied to a distributed database. The storage device includes a communication module and a backup module. The a backup module is configured to make backup of partitioning information and data stored for the distributed database in the storage device, when the communication module receives a command to make a backup of the distributed data base. The partitioning information indicates locations of partitions created by separating a storage area of the distributed database.
  • FIG. 1 is an exemplary view showing an example of a construction of a distributed database 1 to which storages devices according to an embodiment are applied as nodes 10. As shown in FIG. 1, the distributed database 1 includes a plurality of nodes 10 connected to a data channel A. It should be noted that as a method for constructing the distributed database 1, various methods can be adopted. For example, in a method (a), one of the plurality of nodes 10 is applied as a master, and controls the entire distributed database 1; in a method (b), the plurality of nodes 10 equally and independently operate according to a predetermined rule as elements of the distributed database; and in a method (c), a host node which controls the entire distributed database 1 is provided separate from the plurality of nodes 10. A data backup system which will be described later is, however, not limited to any of the above methods.
  • Suppose a host machine makes a request for reading data from the distributed database 1 thereto. In the above method (a), the request made by the host machine is received by the node 10 serving as the master, it is determined which of the nodes 10 holds the above requested data, and (if the node 10 serving as the master does not hold the data) the request is sent to the node 10 determined to hold the data. In the method (b), each of the nodes 10 receives the request made by the host machine, said each node 10 determines whether data held by said each node 10 corresponds to the requested data or not, and of all the nodes 10, the node 10 which determines that the data which it holds corresponds to the requested data reads the data. In the method (c), the request made by the host machine is received by the host node, it is determined which of the nodes 10 holds the requested data, and the request is sent to the node 10 determined to hold the data.
  • As shown in FIG. 1, the nodes 10 each include a communication and I/O controller 11, a storage device 12 and a cache memory 13. The communication and I/O controller 11 of each node 10 is a device for controlling said each node 10, and includes a function of executing communication with the other nodes 10 as a first function.
  • Each node 10 holds status information and partitioning information on the entire distributed database 1, and synchronizes those information in the distributed database 1 with a communication function of its communication and I/O controller 11. The partitioning information is information indicating in which of the nodes 10 each of divided areas (partitions) of a storage area of the entire distributed database 1 is present.
  • The communication and I/O controller 11 includes a function of controlling inputting and outputting of data to and from the storage device 12 and the cache memory 13 as a second function.
  • To be more specific, the communication and I/O controller 11 executes writing/reading of data to/from the storage device 12, while using the cache memory 13 as a cache. With respect to data writing, the communication and I/O controller 11 can execute both write-back processing in which it gives a reply indicating that writing is completed, at the point in time when data is written to the cache memory 13 and write-through processing in which it gives a reply indicating that writing is completed, at the point in time when data is written to the storage device 12. Furthermore, the communication and I/O controller 11 includes a specific mode in which it executes data writing/reading on the storage device 12 without changing the contents of data in the cache memory 13. In order to execute data writing/reading on the storage device 12 without changing the contents of the data in the cache memory 13, a plurality of methods can be adopted. For example, in a method (a), the cache memory 13 is not used, and in a method (b), in the case where data to be read is present in the cache memory 13, the cache memory 13 is used only when the data is read from the cache memory 13.
  • Next, a basic principle of data backup processing in the distributed database 1 to which the storage devices according to the embodiments are applied as the nodes 10 will be explained with reference to FIG. 2.
  • Referring to FIG. 2, storage areas 100 are storage areas of the nodes 10, each of which includes the storage device 12 and the cache memory 13. In the storage area 100 of each node 10, a data area 101 is allocated for the distributed database 1, and holds the partitioning information 102 and status information 103. The communication and I/O controller 11 accesses the data area 101 in response to a request for writing/reading data (I/O request), which is issued by the host machine, based on the partitioning information 102.
  • Suppose a given host machine issues a request (backup request) for backing up data in the distributed database 1. For example, in the case where any of the nodes 10 serves as the master, and controls the entire distributed database 1, after received by the node 10 serving as the master, the above backup request is sent to the other nodes 10 in the distributed database 1 by the communication function of the communication and I/O controller 11.
  • When each of the nodes 10 receives the backup request, the communication and I/O controller 11 of said each node 10 updates the status information 103 to change the status indicated thereby from an operation status to a backing-up status. During the data backup processing, in the case where the distributed database 1 receives a request for data writing/reading (from, e.g., a host machine other than the above host machine issuing the backup request), if the request is a request for data reading, the distributed database 1 immediately executes data reading, and if the request is a request for data writing, the distributed database 1 execute data writing after the backup processing is completed.
  • When the status information 103 is updated the backing-up status, the communication and I/O controller 11 creates a backup (indicated by reference numeral 111 in FIG. 2) of data in the data area 101 in the storage area 100 of the node 10 based on the partitioning information 102. Also, the communication and I/O controller 11 makes a backup (indicated by reference numeral 112 in FIG. 2) of the partitioning information 102 in the storage area 100 of the node 10. When producing the backup of the data of the data area 101, the communication and I/O controller 11 may perform data compression. The backup (111) of the data in the data area 101 and the backup (112) of the partitioning information 102 are stored in a backup file in the node 10.
  • As described above, the communication and I/O controller 11 of each node 10 includes a function of communicating with the other nodes 10. With this communication function, after confirming that data in all the nodes 10 in the distributed database 1 is completely backed up, the communication and I/O controller 11 adds status information indicating that the backup processing is completed to a backup file. Then, after confirming that the status information indicating that the data in all the nodes 10 in the distributed database 1 is completely backed up is added to the backup file, the communication and I/O controller 11 updates the status information 103 to change the status indicated thereby from the backing-up status to the operation status. When the status information indicating the completion of the backup processing is added, it can be determined that the backup file is an available file.
  • That is, in the distributed database 1 to which the storage devices according to the embodiment are applied as the nodes 10, each of the nodes 10 makes backups of the partitioning information 102 and data in the data area 101 for the distributed database 1 in the storage area 100 in said each node 10, as a result of which a backup of the data in the entire distributed database is made. Therefore, it is not necessary to transmit a backup using the data channel A, and thus the data channel A does not become bottleneck. Accordingly, the data in the distributed database 1 can be backed up at a higher speed. It should be noted that the communication and I/O controller 11 includes a function of restoring, with the backup, the data of the node 10 to that of the node 10 at the time of backing up the data of the node 10.
  • Suppose a given host machine issues a request for restoring the data of the distributed database 1 with the backup. For example, in the case where any of the nodes 10 serves as the master, and controls the entire distributed database 1, after being received by the node 10 serving as the master, the above request is sent to the other nodes 10 by the communication function of the communication and I/O controller 11.
  • When each of the nodes 10 receives the request, the communication and I/O controller 11 of said each node 10 updates the status information 103 to change the status indicated thereby from the operation status to a maintenance state. In the maintenance state, the entire distributed database 1 checks integrity of the backup file (to be managed by said each node 10). To be more specific, it is checked, e.g., whether status information indicating completion of backup processing is added to the backup file of said each node 10 or not, and whether all the nodes 10 indicated in the partitioning information 102 are present or not.
  • If it is determined that integrity of the backup file is ensured, the communication and I/O controller 11 of each node 10 updates the status indicated by the status information 103 to a restoring status, and starts to read data from the backup file. Then, when data is completely read from the backup files in all the nodes 10 in the distributed database 1, i.e., data restoring processing is completed, the communication and I/O controller 11 of each node 10 updates the status information 103 from the status indicated thereby to the operation status, and the node 10 serving as the master starts to accept a request for accessing it for data, which is issued by the host machine.
  • Also, it is possible that the backups made by the nodes 10 are set as temporary backups, and successively read out, and then stored as accepted backups by, e.g., a magnetic tape. The restoring function of the communication and I/O controller 11 can also restore data of the node 10 with a backup externally input, such as backup data held by a magnetic tape.
  • Furthermore, in the case where a request for writing data is received during data backup processing, processing for writing the data may be executed in response to the request at the point in time when the data in all the nodes 10 in the distributed database 1 is completely backed up. Also, it may be set that even before completion of the backup processing of the data in all the nodes 10 in the distributed database 1, any of the nodes 10, whose data is completely backed up, is subjected to the data writing processing.
  • Also, it should be noted that in general, of data written to the storage device 12 and that read from the storage device 12, written or read data associated with a newest request is stored in the cache memory 13 used as a cache for the storage device 12 in preference to data associated with other requests. That is, the newer the request, the higher the priority of data to be stored in the cache memory 13. To be more specific, the communication and I/O controller 11 manages the cache memory 13 to replace data associated with the oldest one of requests with data associated with the newest request (cache out).
  • However, in the above general case, after production of a backup in the distributed database 1, the cache memory 13 is full of a large number of data read/written to make the backup. Thus, the data held in the cache memory 13 before the backup processing is almost lost, and thus the performance of the distributed database 1 temporarily lowers just after the backup processing.
  • In view of the above, the storage devices of the above embodiment are made to have a mechanism for preventing the performance of the distributed database 1 from lowering due to backup processing. FIG. 3 is an exemplary view for explaining a mechanism for preventing lowering of the performance due to the backup processing.
  • Referring to FIG. 3, a cache area 150 is an area of the storage area 100, which corresponds to the cache memory 13. As described above, the communication and I/O controller 11 includes a specific mode in which it executes data wring/reading on the storage device 12 without changing the contents of data in the cache memory 13. In this specific mode, the communication and I/O controller 11 reads and writes data from and to the storage device 12 to make a backup (a2 in FIG. 3).
  • By virtue of the above structural feature, the data in the cache memory 13 which is not subjected to the backup processing is maintained, and thus the performance of the distributed database 1 is prevented from temporarily lowering just after the backup processing. It should be noted that the method of handling the cache memory 13 is available not only for the case where a plurality of nodes 10 are provided in the distributed database 1 and their data is backed up, also for the case where a single node 10 is provided and its data is backed up.
  • Also, in the case where the backups made by the nodes 10 are set as temporary backups, and successively read out, and then stored as accepted backups by, e.g., a magnetic tape, the above method of handling the cache memory 13 can be applied at the time of reading the backups. That is, when receiving a request for reading of a backup, the communication and I/O controller 11 executes reading of the backup in the specific mode.
  • FIG. 4 is an exemplary flowchart showing a procedure of the backup processing in the distributed database 1 to which the storage devices according to the embodiment are applied as the nodes 10.
  • In each node 10 in the distributed database 1, the communication and I/O controller 11 first sets the status information 103 such that it indicates “backing-up status” (block A1). If the status information 103 is set to indicate “backing-up status”, the communication and I/O controller 11 makes a backup of the data in the data area 101 (in the specific mode in which the cache is not changed) (block A2). At this time, the communication and I/O controller 11 also makes a backup of the partitioning information 102 (in the specific mode in which the cache is not changed (block A3).
  • The communication and I/O controller 11 determines whether the data in all the nodes 10 in the distributed database 1 is completely backed up or not (block A4). When determining that the data in all the nodes 10 is completely backed up (Yes in block A4), the communication and I/O controller 11 sets the status information 103 such that it indicates “operation status” (block A5).
  • FIG. 5 is an exemplary flowchart showing a procedure of data writing/reading in the distributed database 1 to which the storage devices according to the embodiment are applied as the nodes 10.
  • In each node 10 in the distributed database 1, the communication and I/O controller 11 first determines whether the status information 103 indicates “operation status” or not (block B1). When determining that the status information 103 indicates “operation status” (Yes in block B1), the communication and I/O controller 11 executes writing/reading of data as requested (while using the cache in a regular mode) (block B2).
  • On the other hand, if the status information 103 indicates “backing-up status” (No in block B1), the communication and I/O controller 11 determines whether the issued request is a request for reading of data or not (block B3). When determining that the request is a request for reading of data (Yes in block B3), the communication and I/O controller 11 executes reading of the data as requested (while using the cache in the regular mode) (block B4). On the other hand, if the above request is a request for writing of data (No in block B3), the communication and I/O controller 11 determines whether the data in the node 10 is completely backed up or not (block B5).
  • When determining that the data in the node 10 is completely backed up (Yes in block B5), the communication and I/O controller 11 executes writing of the data as requested (while using the cache in the regular mode) (block B6). On the other hand, when determining that the data in the node 10 is not completely backed up (No in block B5), the communication and I/O controller 11 is on standby until the above data is completely backed up, and it executes wiring of the data as requested after the data in the node 10 is completely backed up.
  • As explained above, for example, the storage devices according to the embodiment enable a backup of data in the distributed database to be made at a high speed.
  • It should be noted that all the steps of the backup processing in the embodiment can be carried out by software. Thus, if an ordinary computer is made to incorporate this software through a computer-readable storage medium, it can easily obtain the same advantage as in the embodiment.
  • The various modules of the systems described herein can be implemented as software applications, hardware and/or software modules, or components on one or more computers, such as servers. While the various modules are illustrated separately, they may share some or all of the same underlying logic or code.
  • While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.

Claims (13)

What is claimed is:
1. A storage device applied to a distributed database, comprising:
a communication module; and
a backup module configured to make backup of partitioning information and data stored for the distributed database in the storage device, when the communication module receives a command to make a backup of the distributed data base, the partitioning information indicating locations of partitions created by separating a storage area of the distributed database.
2. The storage device of claim 1, further comprising:
a cache memory;
a first data input/output module configured to execute data reading/writing while replacing data in the cache memory; and
a second data input/output module configured to execute data reading/writing without replacing the data in the cache memory,
wherein the backup module is configured to execute making of the backup in the storage device with the second data input/output module.
3. The storage device of claim 2, further comprising a backup-data processing module configured to execute reading of the partitioning information and the data stored for the distributed database with the second data input/output module, when the communication module receives a command to transmit the backup of the partitioning information and the data stored for the distributed database.
4. The storage device of claim 1, wherein the backup module is configured to add status information indicating completion of data backup processing to a backup file which stores the partitioning information and the data for the distributed database, after determining that the distributed data base is completely backed up by communication of the communication module.
5. The storage device of claim 1, further comprising a control module configured
to immediately execute data reading when a request for reading of data is received during data backup processing by the backup module, and
to execute data writing after completion of the backup processing by the backup module when a request for writing of data is received during data backup processing by the backup module.
6. The storage device of claim 1, wherein the backup module is configured to make the backup by compressing the data for the distributed database.
7. The storage device of claim 1, further comprising a restoring module configured to restore the partitioning information and the data for the distributed database using the backup made by the backup module.
8. A storage device comprising:
a cache area,
a data storage area,
a first data input/output module configured to execute data writing/reading on the data storage area while replacing data in the cache area;
a second data input/output module configured to execute data writing/reading on the data storage area without replacing data in the cache area; and
a backup module configured to make a copy of data stored in the data storage area in the data storage area with the second data input/output module.
9. The storage device of claim 8, wherein the backup module is configured to make the copy of the data as compressed data.
10. The storage device of claim 8, further comprising a restring module configured to restore the data in the data storage area using the copy made by the backup module.
11. A data backup method of a storage device applied to a distributed database, the method comprising:
making backup of partitioning information and data stored for the distributed database in the storage device, when a command to make a backup of the distributed database is received, the partitioning information indicating a locations of partitions created by separating a storage area of the distributed database.
12. The data backup method of claim 11, wherein the making backup comprises executing making of the backup without changing content of a cache.
13. The data backup method of claim 11, further comprising adding status information indicating completion of data backup processing to a backup file which stores the partitioning information and the data for the distributed database, after determining that the distributed database is completely backed up.
US14/015,550 2012-10-30 2013-08-30 Storage device and data backup method Abandoned US20140122433A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2012-239488 2012-10-30
JP2012239488A JP5342055B1 (en) 2012-10-30 2012-10-30 Storage device and data backup method
PCT/JP2013/056682 WO2014069007A1 (en) 2012-10-30 2013-03-11 Storage device and data backup method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/056682 Continuation WO2014069007A1 (en) 2012-10-30 2013-03-11 Storage device and data backup method

Publications (1)

Publication Number Publication Date
US20140122433A1 true US20140122433A1 (en) 2014-05-01

Family

ID=50548352

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/015,550 Abandoned US20140122433A1 (en) 2012-10-30 2013-08-30 Storage device and data backup method

Country Status (1)

Country Link
US (1) US20140122433A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103100A1 (en) * 2015-10-13 2017-04-13 Bank Of America Corporation System for multidimensional database administration
US20170111286A1 (en) * 2015-10-15 2017-04-20 Kabushiki Kaisha Toshiba Storage system that includes a plurality of routing circuits and a plurality of node modules connected thereto
US10506042B2 (en) 2015-09-22 2019-12-10 Toshiba Memory Corporation Storage system that includes a plurality of routing circuits and a plurality of node modules connected thereto
US10891166B2 (en) 2018-07-13 2021-01-12 Hitachi, Ltd. Storage system and information management method having a plurality of representative nodes and a plurality of general nodes including a plurality of resources
US10977276B2 (en) * 2015-07-31 2021-04-13 International Business Machines Corporation Balanced partition placement in distributed databases

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040267838A1 (en) * 2003-06-24 2004-12-30 International Business Machines Corporation Parallel high speed backup for a storage area network (SAN) file system
US20070214196A1 (en) * 2006-03-08 2007-09-13 International Business Machines Coordinated federated backup of a distributed application environment
US20070250519A1 (en) * 2006-04-25 2007-10-25 Fineberg Samuel A Distributed differential store with non-distributed objects and compression-enhancing data-object routing
US20070250674A1 (en) * 2006-04-25 2007-10-25 Fineberg Samuel A Method and system for scaleable, distributed, differential electronic-data backup and archiving
US20110238915A1 (en) * 2010-03-29 2011-09-29 Fujitsu Limited Storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040267838A1 (en) * 2003-06-24 2004-12-30 International Business Machines Corporation Parallel high speed backup for a storage area network (SAN) file system
US20070214196A1 (en) * 2006-03-08 2007-09-13 International Business Machines Coordinated federated backup of a distributed application environment
US20070250519A1 (en) * 2006-04-25 2007-10-25 Fineberg Samuel A Distributed differential store with non-distributed objects and compression-enhancing data-object routing
US20070250674A1 (en) * 2006-04-25 2007-10-25 Fineberg Samuel A Method and system for scaleable, distributed, differential electronic-data backup and archiving
US20110238915A1 (en) * 2010-03-29 2011-09-29 Fujitsu Limited Storage system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10977276B2 (en) * 2015-07-31 2021-04-13 International Business Machines Corporation Balanced partition placement in distributed databases
US10506042B2 (en) 2015-09-22 2019-12-10 Toshiba Memory Corporation Storage system that includes a plurality of routing circuits and a plurality of node modules connected thereto
US20170103100A1 (en) * 2015-10-13 2017-04-13 Bank Of America Corporation System for multidimensional database administration
US20170111286A1 (en) * 2015-10-15 2017-04-20 Kabushiki Kaisha Toshiba Storage system that includes a plurality of routing circuits and a plurality of node modules connected thereto
US10891166B2 (en) 2018-07-13 2021-01-12 Hitachi, Ltd. Storage system and information management method having a plurality of representative nodes and a plurality of general nodes including a plurality of resources

Similar Documents

Publication Publication Date Title
US10977124B2 (en) Distributed storage system, data storage method, and software program
US8521685B1 (en) Background movement of data between nodes in a storage cluster
US7127557B2 (en) RAID apparatus and logical device expansion method thereof
CN102299904B (en) System and method for realizing service data backup
EP2557494B1 (en) Storage apparatus and data copy method between thin-provisioning virtual volumes
US8539147B2 (en) Apparatus and method for controlling storage system
JP4419884B2 (en) Data replication apparatus, method, program, and storage system
US6922763B2 (en) Method and apparatus for storage system
US10191685B2 (en) Storage system, storage device, and data transfer method
TWI571749B (en) Backup system and backup method thereof
US20200201724A1 (en) Storage system and storage system control method
US20140122433A1 (en) Storage device and data backup method
US9436554B2 (en) Information processing apparatus and data repairing method
CN108959526A (en) Blog management method and log management apparatus
CN100345129C (en) Method, system, and program for assigning priorities
US10664193B2 (en) Storage system for improved efficiency of parity generation and minimized processor load
CN112307049A (en) Method, device and equipment for separating read from write of database and readable storage medium
US8793455B2 (en) Storage apparatus, control method for storage apparatus, and storage system
EP2916230A1 (en) Storage device and data backup method
US9779002B2 (en) Storage control device and storage system
US8510506B2 (en) Disk array device, disk array system and cache control method
WO2018055686A1 (en) Information processing system
US10191690B2 (en) Storage system, control device, memory device, data access method, and program recording medium
US20130262804A1 (en) Data duplication system, data duplication method, and program thereof
JP7050707B2 (en) Storage control device, storage system, storage control method, and storage control program

Legal Events

Date Code Title Description
AS Assignment

Owner name: KABUSHIKI KAISHA TOSHIBA, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MURATA, AKIFUMI;REEL/FRAME:031120/0683

Effective date: 20130829

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION