WO2015187187A1 - Journal events in a file system and a database - Google Patents

Journal events in a file system and a database Download PDF

Info

Publication number
WO2015187187A1
WO2015187187A1 PCT/US2014/048005 US2014048005W WO2015187187A1 WO 2015187187 A1 WO2015187187 A1 WO 2015187187A1 US 2014048005 W US2014048005 W US 2014048005W WO 2015187187 A1 WO2015187187 A1 WO 2015187187A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
database
file
value
update
Prior art date
Application number
PCT/US2014/048005
Other languages
French (fr)
Inventor
Rajkumar Kannan
Annmary Justine KOOMTHANAM
Jothivelavan SIVASHANMUGAM
Ramesh Kannan KARUPPUSAMY
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Publication of WO2015187187A1 publication Critical patent/WO2015187187A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Definitions

  • FIG. 1 is a block diagram of an example computing device using journal events in a file system and a database
  • FIG. 2 is a block diagram of an example computing environment using journal events in a file system and a database
  • FIG. 3 is a flowchart of an example method for using journal events in a file system and a database
  • FIG. 4 is a flowchart of an example method for using journal events in a file system and a database
  • FIG. 5 is a block diagram of an example system using journal events in a file system and a database.
  • Custom metadata allows a user to label a data with customized information that doesn't fit into any of the existing metadata fields.
  • custom metadata may include genre, artist, year performed, etc.
  • custom metadata may include project name, author, reviewed by, signatures, etc.
  • Custom metadata may be associated with a file present in a file system.
  • out-of-band custom metadata may be stored in a database distinct from a file system that includes in-file metadata.
  • a traditional application or service for example, a backup and restore application, a file replication service, etc.
  • a traditional application may only understand commonly used file semantics (such as Posix).
  • Posix commonly used file semantics
  • the present disclosure describes a mechanism to provide query performance and object semantics of a file through a database along with a simultaneous mapping to a traditional metadata namespace (for example, extended file attributes) in such a way that an ingest mechanism inserts custom metadata in a file metadata and a database via a single call.
  • a traditional metadata namespace for example, extended file attributes
  • such mechanism may ensure that both traditional applications as well as new applications may use common semantics to push custom metadata to a file and a database.
  • the present disclosure describes using journal events to maintain a transactionally consistent mapping between metadata associated with a file in a file system and out-of-band metadata associated with the file in a database.
  • a request may be received to insert, update, or delete metadata both to a file in a file system and into a database that stores out-of-band metadata associated with the file.
  • a first journal event may be generated to insert, update, or delete the metadata into the database and define a first value in a consistent field in the database.
  • the first journal event may be processed to insert, update, or delete the metadata into the database.
  • the request to insert, update, or delete the metadata to the file in the file system is processed.
  • journal events may be generated to define a second value in the consistent field in the database.
  • the second journal event may then be processed to update the second value in the consistent field in the database.
  • journal events maintain a transactionally consistent mapping between metadata associated with a file in a file system and out-of-band metadata associated with the file in a database.
  • FIG. 1 is a block diagram of an example computing device 100 using journal events in a file system and a database.
  • Computing device 100 may be a server, a desktop computer, a notebook computer, a tablet computer, and the like.
  • computing device 100 may be a file server 100.
  • File server 100 may include a processor 102 and a machine-readable storage medium 104.
  • Processor 102 may be any type of Central Processing Unit (CPU), microprocessor, or processing logic that interprets and executes machine- readable instructions stored in machine-readable storage medium 104.
  • CPU Central Processing Unit
  • microprocessor or processing logic that interprets and executes machine- readable instructions stored in machine-readable storage medium 104.
  • Machine-readable storage medium 104 may be a random access memory (RAM) or another type of dynamic storage device that may store information and machine-readable instructions that may be executed by processor 102.
  • machine-readable storage medium 104 may be Synchronous DRAM (SDRAM), Double Data Rate (DDR), Rambus DRAM (RDRAM), Rambus RAM, etc. or storage memory media such as a floppy disk, a hard disk, a CD-ROM, a DVD, a pen drive, and the like.
  • machine-readable storage medium 104 may be a non-transitory machine-readable medium.
  • machine-readable storage medium 104 may store a file system 106, a database 108, and a custom metadata module 1 10.
  • module may refer to a software component (machine readable instructions), a hardware component or a combination thereof.
  • a module may include, by way of example, components, such as software components, processes, tasks, co-routines, functions, attributes, procedures, drivers, firmware, data, databases, data structures, Application Specific Integrated Circuits (ASIC) and other computing devices.
  • a module may reside on a volatile or non-volatile storage medium (e.g. 104) and configured to interact with a processor (e.g. 102) of a computing device (e.g. 100).
  • File system 106 may be a local file system or a scale-out file system such as a shared file system or a network file system.
  • Examples of a shared file system may include a Network Attached Storage (NAS) file system or a cluster file system.
  • NAS Network Attached Storage
  • Examples of a network file system may include a distributed file system or a distributed parallel file system.
  • file system 106 may be used for storage and retrieval of data from a storage device. Typically, each piece of data is called a "file" (or file object).
  • file system 106 may include at least one file.
  • a file in the file system may be associated with file metadata and/or custom metadata. Custom metadata may be defined by a user.
  • Metadata (or custom metadata) associated with a file in a file system may be termed as "in-file” metadata.
  • in-file metadata may include extended file attributes. Extended file attributes enable users to associate files with metadata not interpreted by the file system, whereas regular attributes have a purpose strictly defined by the file system. For example, users may use these attributes to store the name of an author of a document, a checksum, a digital signature, etc.
  • File system 106 may include a journaling system 1 12 and a virtual file system interface 1 14.
  • journaling system 1 12 may maintain a special file called a journal that may be used to repair any inconsistencies that may occur as the result of an improper shutdown of a computer.
  • Journaling system 1 12 may write metadata into the journal. In the event of a system crash, if a given set of updates have not been implemented, the system may read the journal in order to roll up to the most recent consistent data point.
  • file system may generate a first journal event to insert, update, or delete the metadata into the database (example, 108) and define a first value in a consistent field of a content metadata table in the database (example, 108).
  • the request to insert, update, or delete metadata to a file may be transferred to a Virtual File System (VFS) layer 1 14 or, more specifically, to specific VFS handler of the file system (example, 106).
  • VFS Virtual File System
  • Virtual File System (VFS) layer 1 14 may act as an interface for an underlying operating system to support a variety of file systems so that the system could handle various types of I/O system calls.
  • the VFS layer 1 14 is Linux VFS layer.
  • file system (example, 106) may generate a first journal event to insert, update, or delete the metadata into the database (example, 108) and define a first value in a consistent field in the database.
  • the VFS handler for a SetXattr call may determine if it's an insert or update metadata call, and based on said determination may generate a first journal event and define a first value in a consistent field (for example, the field may be set to 0) of a content metadata table in the database (example, 108).
  • the VFS handler for RemoveXattr may carry out the same process for a delete metadata call.
  • the first journal event may be processed by journaling system 1 12 to insert, update, or delete the metadata into the database (example, 108).
  • File system (example, 106) may then process the request to insert, update, or delete the metadata to the file in the file system (example, 106).
  • journaling system 1 12 may generate a second journal event to define a second value in the consistent field in the database (example, 108).
  • the second journal event may then be processed by file system (example, 106) to update the second value in the consistent field in the database (example, 108).
  • Database 108 may be a repository that stores an organized collection of data.
  • database 108 may store an out-of-band metadata of a file.
  • "Out-of-band metadata" of a file may be defined as metadata (or custom metadata) that may be stored in a location (example, a database) other than the file system.
  • Database (example, 108) may include one or more tables for storing data.
  • At least one table in the database may be used to store out-of-band content metadata of a file.
  • Such table may be called "content metadata" table.
  • a table in the database (such as content metadata table) may include a consistent field.
  • a consistent field is based on consistency property that ensures that any transaction will bring the database from one valid state to another. In other words, a consistent field may change a value defined for it to reflect a successful transaction.
  • a first value (for instance, 0) may be defined in a consistent field of a database (example, 108) upon generation of a first journal event to insert, update, or delete metadata into the database (example, 108).
  • a second value (for example, 1 ) may be defined in the consistent field of the database (example, 108) upon generation of a second journal event which may occur upon processing of a request to insert, update, or delete metadata to a file in a file system (example, 106).
  • database 108 may be a distributed database that provides high query rates and high-throughput updates using a batching process.
  • Database 108 may use a pipelined architecture that provides access to update batches at various points through processing.
  • database 108 may be based on a batched update model, which decouples update processing from read-only queries (i.e. query processing task). In this model, the updates may be batched and processed in the background, and do not interfere with the foreground query workload.
  • Database 108 may allow different stages of the updates in the pipeline to be queried independently. Queries that could use slightly out-of-date data may use only the final output of the pipeline, which may correspond to the completely ingested and indexed data. Queries that require even fresher results may access data at any stage in the pipeline.
  • database 108 may be integrated into file system 106.
  • Custom metadata module 1 10 may include instructions to receive a request to insert, update, or delete metadata both to a file in a file system (example, 106) and into a database (example, 108) that may store out-of- band metadata associated with the file of said file system.
  • aforesaid request may be in the form of representational state transfer (REST) call.
  • the request may use another data access protocol such as, but not limited to, Network File System (NFS), Server Message Block (SMB), and the like.
  • Custom metadata module 1 10 may include a suitable interface to handle the aforementioned request that may be generated using either of these protocols.
  • said REST call may be handled by Hypertext Transfer protocol (HTTP) service that may pass on the request to custom metadata module 1 10.
  • custom metadata module 1 10 may issue a system call to the file for which the request to insert, update, or delete metadata is intended. The system call is then forwarded to the VFS layer of the file system.
  • custom metadata module 1 10 may issue a Linux call Setfattr to the file for which metadata needs to be set (insert, update, or delete). The call may then be handled by the Linux VFS layer and passed on to the file system specific VFS handler.
  • FIG. 2 is a block diagram of an example computing environment 200 using journal events in a file system and a database.
  • Computing environment 200 may include client systems 202, 204, and 206, a file server 208, and a storage device 210.
  • the number of client systems 202, 204, and 206, file server 208, and storage device 210 shown in FIG. 1 is for the purpose of illustration only and their number may vary in other implementations.
  • computing environment 200 may represent a scale-out file system.
  • Client systems 202, 204, and 206 may each be a computing device such as a desktop computer, a notebook computer, a tablet computer, a mobile phone, personal digital assistant (PDA), a server, and the like. Client systems 202, 204, and 206, may communicate with file server 208 via a computer network 212.
  • Computer network 212 may be a wireless or wired network. Computer network 212 may include, for example, a Local Area Network (LAN), a Wireless Local Area Network (WAN), a Metropolitan Area Network (MAN), a Storage Area Network (SAN), a Campus Area Network (CAN), or the like. Further, computer network 212 may be a public network (for example, the Internet) or a private network (for example, an intranet).
  • client systems 202, 204, and 206 may be directly coupled to file server 208.
  • client systems 202, 204, and 206 may host one or more applications 224, 226, and 228 that may, in an example, send a request to file server (example, 208) to insert, update, or delete metadata both to a file in file system and into a database that may store out-of-band metadata of said file.
  • an application may use a data access protocol such as, but not limited to, Hypertext Transfer Protocol (HTTP), Network File System (NFS), Server Message Block (SMB), and the like, to read and/or write data such as files, metadata, custom metadata, and the like, from file server 208.
  • HTTP Hypertext Transfer Protocol
  • NFS Network File System
  • SMB Server Message Block
  • File server 208 may include a non-transitory machine- readable storage medium 214 that may store machine executable instructions.
  • file server 208 may be similar to file server 100 described earlier. Accordingly, components of file server 208 that are similarly named and illustrated in file server 100 may be considered similar.
  • components or reference numerals of FIG. 2 having a same or similarly described function in FIG. 1 are not being described in connection with FIG. 2. Said components or reference numerals may be considered alike.
  • machine-readable storage medium 214 may store a file system 106, a database 108, a custom metadata module 1 10, and an archive journal scanner module 220.
  • Archive journal scanner module 220 may include instructions to process a journal generated by journaling subsystem.
  • archive journal module may include instructions to identify, from a database, a consistent field that includes a first value which may be defined for the consistent field upon generation of a first journal event to insert, update, or delete metadata into the database.
  • Archive journal scanner module 220 may further include instructions to identify a metadata entry related to the consistent field with the first value in the database.
  • Archive journal scanner module 220 may then determine that a metadata entry corresponding to the metadata entry related to the consistent field with the first value is present in metadata of the file in a file system and, in response to the determination, may update the first value to a second value in the consistent field of the database.
  • Storage device 210 may be used to store and retrieve data stored by file system 106.
  • Some non-limiting examples of storage device 210 may include a Direct Attached Storage (DAS) device, a Network Attached Storage (NAS) device, a tape drive, a magnetic tape drive, or a combination of these devices.
  • Storage device 210 may be directly coupled to file server 106 or may communicate with file server 106 via a computer network 222.
  • Such a computer network 222 may be similar to the computer network 212 described above.
  • computer network 222 may be a Storage Area Network (SAN).
  • SAN Storage Area Network
  • FIG. 3 is a flowchart of an example method 300 for using journal events in a file system and a database.
  • the method 300 may at least partially be executed on a computing device 100 of FIG. 1 or file server 208 of FIG. 2. However, other computing devices may be used as well.
  • a request may be received to insert, update, or delete metadata both to a file in a file system (example, 106) and into a database (example, 108) that may store out-of-band metadata associated with the file.
  • such request may received by custom metadata module.
  • the file system may generate a first journal event to insert, update, or delete the metadata into the database (example, 108) and define a first value (for example, 0) in a consistent field in the database (example, 108).
  • the consistent field may be present in a content metadata table of the database (example, 108).
  • a journaling system may process the first journal event to insert, update, or delete the metadata into the database (example, 108). In other words, the journaling system may insert, update, or delete the metadata into the database (example, 108).
  • the file system may process the request to insert, update, or delete the metadata to the file in the file system (example, 106).
  • the file system (example, 106) may insert, update, or delete the metadata into the database (example, 108).
  • the file system may generate a second journal event to define a second value (for example, 1 ) in the consistent field in the database (example, 108).
  • the journaling system may process the second journal event to update the second value in the consistent field in the database (example, 108). In other words, the journaling system may replace the first value (for example, 0) with a second value (for example, 1 ) in the consistent field.
  • example method 400 is a flowchart of an example method 400 for using journal events in a file system and a database.
  • the method 400 may at least partially be executed on a computing device 100 of FIG. 1 or file server 208 of FIG. 2. However, other computing devices may be used as well.
  • example method 400 may be used to ensure that there are no metadata inconsistencies between metadata associated with a file in a file system and out-of-band metadata associated with the file in a database (example, 108) if a system crash takes place during processing of a request to insert, update, or delete metadata both to a file in a file system and into a database (example, 108) that stores out-of-band metadata associated with the file.
  • an archive journal scanner module may identify, from a database (example, 108) that may store out-of-band metadata associated with a file, a consistent field (or fields) that includes a given value (or first value). For instance, a given value may be zero (0).
  • an archive journal scanner module may identify a metadata entry related to the identified consistent field with the first value in the database (example, 108). In other words, a journal module may identify a metadata entry that may have been inserted, updated, or deleted against a consistent field with a given value.
  • an archive journal scanner module may determine that a metadata entry corresponding to the metadata entry related to the consistent field with the given value (i.e. first value) is present in metadata of a file in a file system (example, 106). In other words, a determination is made whether there's a metadata entry in the file system that matches with a metadata entry against a given value (for example, 0) in the database (example, 108).
  • the first value (for example, 0) may be update to a second value (for example, 1 ) in said consistent field of the database. Presence of a matching metadata entry both in a file system and a database indicates that a request to insert, update, or delete metadata both to a file in the file system (example, 106) and into the database (example, 108) is successful. In such case, a first value in a consistent value may be updated to a second value to indicate a successful transaction.
  • FIG. 5 is a block diagram of an example system 500 using journal events in a file system and a database.
  • System 500 includes a processor 502 and a machine-readable storage medium 504 communicatively coupled through a system bus.
  • system 500 may be analogous to computing device 100 of FIG. 1 or file server 208 of FIG. 2.
  • Processor 502 may be any type of Central Processing Unit (CPU), microprocessor, or processing logic that interprets and executes machine-readable instructions stored in machine-readable storage medium 504.
  • Machine-readable storage medium 504 may be a random access memory (RAM) or another type of dynamic storage device that may store information and machine-readable instructions that may be executed by processor 502.
  • machine-readable storage medium 504 may be Synchronous DRAM (SDRAM), Double Data Rate (DDR), Rambus DRAM (RDRAM), Rambus RAM, etc. or a storage memory media such as a floppy disk, a hard disk, a CD-ROM, a DVD, a pen drive, and the like.
  • machine-readable storage medium 504 may be a non-transitory machine-readable medium.
  • Machine-readable storage medium 504 may store instructions 506, 508, 510, 512, 514, and 516.
  • instructions 506 may be executed by processor 502 to receive a request to insert, update, or delete metadata both to a file in a file system and into a database that stores out-of-band metadata associated with the file.
  • Instructions 508 may be executed by processor 502 to generate a first journal event to insert, update, or delete the metadata into the database and define a first value in a consistent field in the database.
  • Instructions 510 may be executed by processor 502 to process the first journal event to insert, update, or delete the metadata into the database.
  • Instructions 512 may be executed by processor 502 to process the request to insert, update, or delete the metadata to the file in the file system.
  • Instructions 514 may be executed by processor 502 to generate a second journal event to define a second value in the consistent field in the database.
  • Instructions 516 may be executed by processor 502 to process the second journal event to update the second value in the consistent field in the database.
  • Embodiments within the scope of the present solution may also include program products comprising non- transitory computer-readable media for carrying or having computer- executable instructions or data structures stored thereon.
  • Such computer- readable media can be any available media that can be accessed by a general purpose or special purpose computer.
  • such computer-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM, magnetic disk storage or other storage devices, or any other medium which can be used to carry or store desired program code in the form of computer-executable instructions and which can be accessed by a general purpose or special purpose computer.
  • the computer readable instructions can also be accessed from memory and executed by a processor. 33] It may be noted that the above-described examples of the present solution is for the purpose of illustration only. Although the solution has been described in conjunction with a specific embodiment thereof, numerous modifications may be possible without materially departing from the teachings and advantages of the subject matter described herein. Other substitutions, modifications and changes may be made without departing from the spirit of the present solution. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.

Abstract

In an example technique, a request may be received to insert, update, or delete metadata both to a file in a file system and into a database that stores out-ofband metadata associated with the file. A first journal event may be generated to insert, update, or delete the metadata into the database and define a first value in a consistent field in the database. The first journal event may be processed to insert, update, or delete the metadata into the database. The technique may then process the request to insert, update, or delete the metadata to the file in the file system. A second journal event may be generated to define a second value in the consistent field in the database. The second journal event may be processed to update the second value in the consistent field in the database.

Description

JOURNAL EVENTS IN A FILE SYSTEM AND A DATABASE
Background
[001] Storage systems are inevitable in modern day computing. Whether it is a general purpose computing device or a large data center of an enterprise, storage systems have become a key part of any computing experience. Exploding growth in structured and unstructured data over the years has also led enterprises to pursue storage solutions that could store terabytes or petabytes of data with reduced costs, complexity, and time. Organizations are looking to extract meaningful and customized business value from such large pools of data.
Brief Description of the Drawings
[002] For a better understanding of the solution, embodiments will now be described, purely by way of example, with reference to the accompanying drawings, in which:
[003] FIG. 1 is a block diagram of an example computing device using journal events in a file system and a database;
[004] FIG. 2 is a block diagram of an example computing environment using journal events in a file system and a database;
[005] FIG. 3 is a flowchart of an example method for using journal events in a file system and a database;
[006] FIG. 4 is a flowchart of an example method for using journal events in a file system and a database; and [007] FIG. 5 is a block diagram of an example system using journal events in a file system and a database.
Detailed Description
[008] Growth in structured and unstructured data has led enterprises to invest in storage solutions that could help them extract information which is of business value to them. One such mechanism is to allow a user to define custom metadata. Custom metadata allows a user to label a data with customized information that doesn't fit into any of the existing metadata fields. To provide an example, in case the data is a music file, custom metadata may include genre, artist, year performed, etc. In another example, if the data is a business document, custom metadata may include project name, author, reviewed by, signatures, etc.
[009] Custom metadata may be associated with a file present in a file system.
However, in an instance, out-of-band custom metadata may be stored in a database distinct from a file system that includes in-file metadata. In such case, one of the challenges of maintaining an out-of-band metadata in a database is that a traditional application or service (for example, a backup and restore application, a file replication service, etc.) may not have any mechanism to implicitly understand such data. A traditional application may only understand commonly used file semantics (such as Posix). Thus, while out-of-band metadata (or the custom metadata) may be critical for building the object semantic on top of a file, it may also pose a challenge to the traditional applications and services that do not understand out-of-band metadata in a database and are only able to read the usual file metadata stored on a file system.
[0010] The present disclosure describes a mechanism to provide query performance and object semantics of a file through a database along with a simultaneous mapping to a traditional metadata namespace (for example, extended file attributes) in such a way that an ingest mechanism inserts custom metadata in a file metadata and a database via a single call. In an example, such mechanism may ensure that both traditional applications as well as new applications may use common semantics to push custom metadata to a file and a database.
[0011] The present disclosure describes using journal events to maintain a transactionally consistent mapping between metadata associated with a file in a file system and out-of-band metadata associated with the file in a database. In an example, a request may be received to insert, update, or delete metadata both to a file in a file system and into a database that stores out-of-band metadata associated with the file. In response to the request, a first journal event may be generated to insert, update, or delete the metadata into the database and define a first value in a consistent field in the database. The first journal event may be processed to insert, update, or delete the metadata into the database. Further, the request to insert, update, or delete the metadata to the file in the file system is processed. Next, a second journal event may be generated to define a second value in the consistent field in the database. The second journal event may then be processed to update the second value in the consistent field in the database. Thus, by processing a single request to insert, update, or delete metadata both to a file in a file system and into a database that stores out-of-band metadata associated with the file, journal events maintain a transactionally consistent mapping between metadata associated with a file in a file system and out-of-band metadata associated with the file in a database.
[0012] FIG. 1 is a block diagram of an example computing device 100 using journal events in a file system and a database. Computing device 100 may be a server, a desktop computer, a notebook computer, a tablet computer, and the like. In an example, computing device 100 may be a file server 100. File server 100 may include a processor 102 and a machine-readable storage medium 104. [0013] Processor 102 may be any type of Central Processing Unit (CPU), microprocessor, or processing logic that interprets and executes machine- readable instructions stored in machine-readable storage medium 104.
[0014] Machine-readable storage medium 104 may be a random access memory (RAM) or another type of dynamic storage device that may store information and machine-readable instructions that may be executed by processor 102. For example, machine-readable storage medium 104 may be Synchronous DRAM (SDRAM), Double Data Rate (DDR), Rambus DRAM (RDRAM), Rambus RAM, etc. or storage memory media such as a floppy disk, a hard disk, a CD-ROM, a DVD, a pen drive, and the like. In an example, machine-readable storage medium 104 may be a non-transitory machine-readable medium.
[0015] In an example, machine-readable storage medium 104 may store a file system 106, a database 108, and a custom metadata module 1 10. The term "module" may refer to a software component (machine readable instructions), a hardware component or a combination thereof. A module may include, by way of example, components, such as software components, processes, tasks, co-routines, functions, attributes, procedures, drivers, firmware, data, databases, data structures, Application Specific Integrated Circuits (ASIC) and other computing devices. A module may reside on a volatile or non-volatile storage medium (e.g. 104) and configured to interact with a processor (e.g. 102) of a computing device (e.g. 100).
[0016] File system 106 may be a local file system or a scale-out file system such as a shared file system or a network file system. Examples of a shared file system may include a Network Attached Storage (NAS) file system or a cluster file system. Examples of a network file system may include a distributed file system or a distributed parallel file system. In general, file system 106 may be used for storage and retrieval of data from a storage device. Typically, each piece of data is called a "file" (or file object). In an example, file system 106 may include at least one file. A file in the file system may be associated with file metadata and/or custom metadata. Custom metadata may be defined by a user. Metadata (or custom metadata) associated with a file in a file system may be termed as "in-file" metadata. In an example, such "in-file" metadata may include extended file attributes. Extended file attributes enable users to associate files with metadata not interpreted by the file system, whereas regular attributes have a purpose strictly defined by the file system. For example, users may use these attributes to store the name of an author of a document, a checksum, a digital signature, etc.
[0017] File system 106 may include a journaling system 1 12 and a virtual file system interface 1 14. In an example, journaling system 1 12 may maintain a special file called a journal that may be used to repair any inconsistencies that may occur as the result of an improper shutdown of a computer. Journaling system 1 12 may write metadata into the journal. In the event of a system crash, if a given set of updates have not been implemented, the system may read the journal in order to roll up to the most recent consistent data point.
[0018] In an example, upon receipt of a request to insert, update, or delete metadata both to a file in a file system (example, 106) and into a database (example, 108), file system (example, 106) may generate a first journal event to insert, update, or delete the metadata into the database (example, 108) and define a first value in a consistent field of a content metadata table in the database (example, 108). In an example, the request to insert, update, or delete metadata to a file may be transferred to a Virtual File System (VFS) layer 1 14 or, more specifically, to specific VFS handler of the file system (example, 106). Virtual File System (VFS) layer 1 14 may act as an interface for an underlying operating system to support a variety of file systems so that the system could handle various types of I/O system calls. In an example, the VFS layer 1 14 is Linux VFS layer. In response to the aforesaid request, file system (example, 106) may generate a first journal event to insert, update, or delete the metadata into the database (example, 108) and define a first value in a consistent field in the database. In an example, in case of a Linux VFS layer, the VFS handler for a SetXattr call may determine if it's an insert or update metadata call, and based on said determination may generate a first journal event and define a first value in a consistent field (for example, the field may be set to 0) of a content metadata table in the database (example, 108). The VFS handler for RemoveXattr may carry out the same process for a delete metadata call. The first journal event may be processed by journaling system 1 12 to insert, update, or delete the metadata into the database (example, 108). File system (example, 106) may then process the request to insert, update, or delete the metadata to the file in the file system (example, 106). Next, journaling system 1 12 may generate a second journal event to define a second value in the consistent field in the database (example, 108). The second journal event may then be processed by file system (example, 106) to update the second value in the consistent field in the database (example, 108). 19] Database 108 may be a repository that stores an organized collection of data. In an example, database 108 may store an out-of-band metadata of a file. "Out-of-band metadata" of a file may be defined as metadata (or custom metadata) that may be stored in a location (example, a database) other than the file system. Database (example, 108) may include one or more tables for storing data. In an example, at least one table in the database (example, 108) may be used to store out-of-band content metadata of a file. Such table may be called "content metadata" table. In an example, a table in the database (such as content metadata table) may include a consistent field. A consistent field is based on consistency property that ensures that any transaction will bring the database from one valid state to another. In other words, a consistent field may change a value defined for it to reflect a successful transaction. In an example, a first value (for instance, 0) may be defined in a consistent field of a database (example, 108) upon generation of a first journal event to insert, update, or delete metadata into the database (example, 108). A second value (for example, 1 ) may be defined in the consistent field of the database (example, 108) upon generation of a second journal event which may occur upon processing of a request to insert, update, or delete metadata to a file in a file system (example, 106).
[0020] In an example, database 108 may be a distributed database that provides high query rates and high-throughput updates using a batching process. Database 108 may use a pipelined architecture that provides access to update batches at various points through processing. In an instance, database 108 may be based on a batched update model, which decouples update processing from read-only queries (i.e. query processing task). In this model, the updates may be batched and processed in the background, and do not interfere with the foreground query workload. Database 108 may allow different stages of the updates in the pipeline to be queried independently. Queries that could use slightly out-of-date data may use only the final output of the pipeline, which may correspond to the completely ingested and indexed data. Queries that require even fresher results may access data at any stage in the pipeline. In an example, database 108 may be integrated into file system 106.
[0021] Custom metadata module 1 10 may include instructions to receive a request to insert, update, or delete metadata both to a file in a file system (example, 106) and into a database (example, 108) that may store out-of- band metadata associated with the file of said file system. In an example, aforesaid request may be in the form of representational state transfer (REST) call. In other examples, the request may use another data access protocol such as, but not limited to, Network File System (NFS), Server Message Block (SMB), and the like. Custom metadata module 1 10 may include a suitable interface to handle the aforementioned request that may be generated using either of these protocols. In an example, said REST call may be handled by Hypertext Transfer protocol (HTTP) service that may pass on the request to custom metadata module 1 10. In turn, custom metadata module 1 10 may issue a system call to the file for which the request to insert, update, or delete metadata is intended. The system call is then forwarded to the VFS layer of the file system. In an example, if the underlying operating system is Linux, custom metadata module 1 10 may issue a Linux call Setfattr to the file for which metadata needs to be set (insert, update, or delete). The call may then be handled by the Linux VFS layer and passed on to the file system specific VFS handler.
[0022] FIG. 2 is a block diagram of an example computing environment 200 using journal events in a file system and a database. Computing environment 200 may include client systems 202, 204, and 206, a file server 208, and a storage device 210. The number of client systems 202, 204, and 206, file server 208, and storage device 210 shown in FIG. 1 is for the purpose of illustration only and their number may vary in other implementations. In an example, computing environment 200 may represent a scale-out file system.
[0023] Client systems 202, 204, and 206 may each be a computing device such as a desktop computer, a notebook computer, a tablet computer, a mobile phone, personal digital assistant (PDA), a server, and the like. Client systems 202, 204, and 206, may communicate with file server 208 via a computer network 212. Computer network 212 may be a wireless or wired network. Computer network 212 may include, for example, a Local Area Network (LAN), a Wireless Local Area Network (WAN), a Metropolitan Area Network (MAN), a Storage Area Network (SAN), a Campus Area Network (CAN), or the like. Further, computer network 212 may be a public network (for example, the Internet) or a private network (for example, an intranet). In an example, client systems 202, 204, and 206, may be directly coupled to file server 208. [0024] In an example, client systems 202, 204, and 206 may host one or more applications 224, 226, and 228 that may, in an example, send a request to file server (example, 208) to insert, update, or delete metadata both to a file in file system and into a database that may store out-of-band metadata of said file. Ina an example, an application (for example, 224, 226, and 228) may use a data access protocol such as, but not limited to, Hypertext Transfer Protocol (HTTP), Network File System (NFS), Server Message Block (SMB), and the like, to read and/or write data such as files, metadata, custom metadata, and the like, from file server 208.
[0025] File server 208 may include a non-transitory machine- readable storage medium 214 that may store machine executable instructions. In an example, file server 208 may be similar to file server 100 described earlier. Accordingly, components of file server 208 that are similarly named and illustrated in file server 100 may be considered similar. For the sake of brevity, components or reference numerals of FIG. 2 having a same or similarly described function in FIG. 1 are not being described in connection with FIG. 2. Said components or reference numerals may be considered alike.
[0026] In an example, machine-readable storage medium 214 may store a file system 106, a database 108, a custom metadata module 1 10, and an archive journal scanner module 220.
[0027] Archive journal scanner module 220 may include instructions to process a journal generated by journaling subsystem. In an example, archive journal module may include instructions to identify, from a database, a consistent field that includes a first value which may be defined for the consistent field upon generation of a first journal event to insert, update, or delete metadata into the database. Archive journal scanner module 220 may further include instructions to identify a metadata entry related to the consistent field with the first value in the database. Archive journal scanner module 220 may then determine that a metadata entry corresponding to the metadata entry related to the consistent field with the first value is present in metadata of the file in a file system and, in response to the determination, may update the first value to a second value in the consistent field of the database.
[0028] Storage device 210 may be used to store and retrieve data stored by file system 106. Some non-limiting examples of storage device 210 may include a Direct Attached Storage (DAS) device, a Network Attached Storage (NAS) device, a tape drive, a magnetic tape drive, or a combination of these devices. Storage device 210 may be directly coupled to file server 106 or may communicate with file server 106 via a computer network 222. Such a computer network 222 may be similar to the computer network 212 described above. In an example, computer network 222 may be a Storage Area Network (SAN).
[0029] FIG. 3 is a flowchart of an example method 300 for using journal events in a file system and a database. The method 300, which is described below, may at least partially be executed on a computing device 100 of FIG. 1 or file server 208 of FIG. 2. However, other computing devices may be used as well. At block 302, a request may be received to insert, update, or delete metadata both to a file in a file system (example, 106) and into a database (example, 108) that may store out-of-band metadata associated with the file. In an example, such request may received by custom metadata module. At block 304, the file system (example, 106) may generate a first journal event to insert, update, or delete the metadata into the database (example, 108) and define a first value (for example, 0) in a consistent field in the database (example, 108). In an example, the consistent field may be present in a content metadata table of the database (example, 108). At block 306, a journaling system may process the first journal event to insert, update, or delete the metadata into the database (example, 108). In other words, the journaling system may insert, update, or delete the metadata into the database (example, 108). At block 308, the file system may process the request to insert, update, or delete the metadata to the file in the file system (example, 106). In other words, the file system (example, 106) may insert, update, or delete the metadata into the database (example, 108). At block 310, the file system may generate a second journal event to define a second value (for example, 1 ) in the consistent field in the database (example, 108). At block 312, the journaling system may process the second journal event to update the second value in the consistent field in the database (example, 108). In other words, the journaling system may replace the first value (for example, 0) with a second value (for example, 1 ) in the consistent field. 30] FIG. 4 is a flowchart of an example method 400 for using journal events in a file system and a database. The method 400, which is described below, may at least partially be executed on a computing device 100 of FIG. 1 or file server 208 of FIG. 2. However, other computing devices may be used as well. In an example, example method 400 may be used to ensure that there are no metadata inconsistencies between metadata associated with a file in a file system and out-of-band metadata associated with the file in a database (example, 108) if a system crash takes place during processing of a request to insert, update, or delete metadata both to a file in a file system and into a database (example, 108) that stores out-of-band metadata associated with the file. At block 402, an archive journal scanner module (example, 220) may identify, from a database (example, 108) that may store out-of-band metadata associated with a file, a consistent field (or fields) that includes a given value (or first value). For instance, a given value may be zero (0). Upon said identification, at block 404, an archive journal scanner module (example, 220) may identify a metadata entry related to the identified consistent field with the first value in the database (example, 108). In other words, a journal module may identify a metadata entry that may have been inserted, updated, or deleted against a consistent field with a given value. At block 406, an archive journal scanner module (example, 220) may determine that a metadata entry corresponding to the metadata entry related to the consistent field with the given value (i.e. first value) is present in metadata of a file in a file system (example, 106). In other words, a determination is made whether there's a metadata entry in the file system that matches with a metadata entry against a given value (for example, 0) in the database (example, 108). At block 408, in response to the determination, if there's a metadata entry in the file system that matches with a metadata entry against a given value (for example, 0) in the database (example, 108), the first value (for example, 0) may be update to a second value (for example, 1 ) in said consistent field of the database. Presence of a matching metadata entry both in a file system and a database indicates that a request to insert, update, or delete metadata both to a file in the file system (example, 106) and into the database (example, 108) is successful. In such case, a first value in a consistent value may be updated to a second value to indicate a successful transaction. In the event there's no metadata entry in the file system that matches with a metadata entry against a first value (for example, 0) in the database (example, 108), metadata of the file in the file system (for example, extended file attributes) may be updated with the first value to attain one-to-one mapping between metadata of the file in the file system and metadata of the file in the database. 31] FIG. 5 is a block diagram of an example system 500 using journal events in a file system and a database. System 500 includes a processor 502 and a machine-readable storage medium 504 communicatively coupled through a system bus. In an example, system 500 may be analogous to computing device 100 of FIG. 1 or file server 208 of FIG. 2. Processor 502 may be any type of Central Processing Unit (CPU), microprocessor, or processing logic that interprets and executes machine-readable instructions stored in machine-readable storage medium 504. Machine-readable storage medium 504 may be a random access memory (RAM) or another type of dynamic storage device that may store information and machine-readable instructions that may be executed by processor 502. For example, machine-readable storage medium 504 may be Synchronous DRAM (SDRAM), Double Data Rate (DDR), Rambus DRAM (RDRAM), Rambus RAM, etc. or a storage memory media such as a floppy disk, a hard disk, a CD-ROM, a DVD, a pen drive, and the like. In an example, machine-readable storage medium 504 may be a non-transitory machine-readable medium. Machine-readable storage medium 504 may store instructions 506, 508, 510, 512, 514, and 516. In an example, instructions 506 may be executed by processor 502 to receive a request to insert, update, or delete metadata both to a file in a file system and into a database that stores out-of-band metadata associated with the file. Instructions 508 may be executed by processor 502 to generate a first journal event to insert, update, or delete the metadata into the database and define a first value in a consistent field in the database. Instructions 510 may be executed by processor 502 to process the first journal event to insert, update, or delete the metadata into the database. Instructions 512 may be executed by processor 502 to process the request to insert, update, or delete the metadata to the file in the file system. Instructions 514 may be executed by processor 502 to generate a second journal event to define a second value in the consistent field in the database. Instructions 516 may be executed by processor 502 to process the second journal event to update the second value in the consistent field in the database. 32] For the purpose of simplicity of explanation, the example methods of FIGS. 3 and 4 are shown as executing serially, however it is to be understood and appreciated that the present and other examples are not limited by the illustrated order. The example systems of FIGS. 1 , 2 and 5, and methods of FIGS. 3 and 4 may be implemented in the form of a computer program product including computer-executable instructions, such as program code, which may be run on any suitable computing device in conjunction with a suitable operating system (for example, Microsoft Windows, Linux, UNIX, and the like). Embodiments within the scope of the present solution may also include program products comprising non- transitory computer-readable media for carrying or having computer- executable instructions or data structures stored thereon. Such computer- readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, such computer-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM, magnetic disk storage or other storage devices, or any other medium which can be used to carry or store desired program code in the form of computer-executable instructions and which can be accessed by a general purpose or special purpose computer. The computer readable instructions can also be accessed from memory and executed by a processor. 33] It may be noted that the above-described examples of the present solution is for the purpose of illustration only. Although the solution has been described in conjunction with a specific embodiment thereof, numerous modifications may be possible without materially departing from the teachings and advantages of the subject matter described herein. Other substitutions, modifications and changes may be made without departing from the spirit of the present solution. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.

Claims

Claims:
1 . A method, comprising:
receiving a request to insert, update, or delete metadata both to a file in a file system and into a database that stores out-of-band metadata associated with the file;
generating a first journal event to insert, update, or delete the metadata into the database and define a first value in a consistent field in the database; processing the first journal event to insert, update, or delete the metadata into the database;
processing the request to insert, update, or delete the metadata to the file in the file system;
generating a second journal event to define a second value in the consistent field in the database; and
processing the second journal event to update the second value in the consistent field in the database.
2. The method of claim 1 , wherein the metadata is custom metadata.
3. The method of claim 1 , wherein the request is a representational state transfer (REST) call.
4. The method of claim 1 , wherein the metadata associated with the file in the file system includes extended file attributes of the file.
5. The method of claim 1 , further comprising:
identifying, from the database, the consistent field with the first value; identifying a metadata entry related to the consistent field with the first value in the database;
determining that a metadata entry corresponding to the metadata entry related to the consistent field with the first value is present in metadata of the file in the file system; and in response to the determination, updating the first value to the second value in the consistent field of the database.
6. The method of claim 1 , wherein the processing of the request to insert, update, or delete the metadata to the file in the file system comprises using a Virtual File System (VFS) to insert, update, or delete the metadata to the file in the file system.
7. A system, comprising:
a file system comprising a journaling system;
a database; and
a custom metadata module to receive a request to insert, update, or delete a custom metadata entry both to a file in the file system and into the database, wherein:
the file system is to generate a first journal event to insert, update, or delete the custom metadata entry into a content metadata table in the database and define a first value in a consistent field of the content metadata table;
the journaling system is to process the first journal event to insert, update, or delete the custom metadata entry into the content metadata table in the database;
the file system is to process the request to insert, update, or delete the custom metadata entry to the file in the file system;
the file system is to generate a second journal event to define a second value in the consistent field of the content metadata table in the database; and the journaling system is to process the second journal event to update the second value in the consistent field of the content metadata table.
8. The system of claim 7, wherein the request is received from an application present on a communicatively coupled client system.
9. The system of claim 7, further comprising a journal scanner module to: identify, from the content metadata table in the database, the consistent field with the first value;
identify the custom metadata entry related to the consistent field with the first value in the content metadata table;
determine that a content metadata entry corresponding to the custom metadata entry related to the consistent field with the first value is present in metadata of the file in the file system; and
in response to the determination, update the first value to the second value in the consistent field of the content metadata table.
10. The system of claim 7, wherein the database is integrated into the file system.
1 1 . The system of claim 7, wherein the database is to allow pipelining of updates and independent querying of the pipelined updates.
12. The system of claim 7, wherein the database stores out-of-band metadata associated with the file.
13. A non-transitory machine-readable storage medium comprising instructions executable by a processor to:
receive a request to insert, update, or delete custom metadata both to a file in a file system and into a database;
generate a first journal event to insert, update, or delete the custom metadata into the database and define a first value in a consistent field in the database;
process the first journal event to insert, update, or delete the custom metadata into the database;
process the request to insert, update, or delete the custom metadata to the file in the file system;
generate a second journal event to define a second value in the consistent field in the database; and process the second journal event to update the second value in the consistent field in the database.
14. The storage medium of claim 13, further include instructions to:
identify, from the database, the consistent field with the first value;
identify the custom metadata related to the consistent field with the first value in the content metadata table;
determine that a custom metadata corresponding to the custom metadata related to the consistent field with the first value is present in metadata of the file in the file system; and
in response to the determination, replace the first value with the second value in the consistent field in the database.
15. The storage medium of claim 13, wherein the file system is a Network Attached Storage (NAS) file system.
PCT/US2014/048005 2014-06-02 2014-07-24 Journal events in a file system and a database WO2015187187A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN2704/CHE/2014 2014-06-02
IN2704CH2014 2014-06-02

Publications (1)

Publication Number Publication Date
WO2015187187A1 true WO2015187187A1 (en) 2015-12-10

Family

ID=54767125

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/048005 WO2015187187A1 (en) 2014-06-02 2014-07-24 Journal events in a file system and a database

Country Status (1)

Country Link
WO (1) WO2015187187A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10942910B1 (en) * 2018-11-26 2021-03-09 Amazon Technologies, Inc. Journal queries of a ledger-based database
US11036708B2 (en) 2018-11-26 2021-06-15 Amazon Technologies, Inc. Indexes on non-materialized views
US11119998B1 (en) 2018-11-26 2021-09-14 Amazon Technologies, Inc. Index and view updates in a ledger-based database
US11196567B2 (en) 2018-11-26 2021-12-07 Amazon Technologies, Inc. Cryptographic verification of database transactions

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6877109B2 (en) * 2001-11-19 2005-04-05 Lsi Logic Corporation Method for the acceleration and simplification of file system logging techniques using storage device snapshots
US20100031274A1 (en) * 2004-05-10 2010-02-04 Siew Yong Sim-Tang Method and system for real-time event journaling to provide enterprise data services
US7809778B2 (en) * 2006-03-08 2010-10-05 Omneon Video Networks Idempotent journal mechanism for file system
US8145686B2 (en) * 2005-05-06 2012-03-27 Microsoft Corporation Maintenance of link level consistency between database and file system
US8412685B2 (en) * 2004-07-26 2013-04-02 Riverbed Technology, Inc. Method and system for managing data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6877109B2 (en) * 2001-11-19 2005-04-05 Lsi Logic Corporation Method for the acceleration and simplification of file system logging techniques using storage device snapshots
US20100031274A1 (en) * 2004-05-10 2010-02-04 Siew Yong Sim-Tang Method and system for real-time event journaling to provide enterprise data services
US8412685B2 (en) * 2004-07-26 2013-04-02 Riverbed Technology, Inc. Method and system for managing data
US8145686B2 (en) * 2005-05-06 2012-03-27 Microsoft Corporation Maintenance of link level consistency between database and file system
US7809778B2 (en) * 2006-03-08 2010-10-05 Omneon Video Networks Idempotent journal mechanism for file system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10942910B1 (en) * 2018-11-26 2021-03-09 Amazon Technologies, Inc. Journal queries of a ledger-based database
US11036708B2 (en) 2018-11-26 2021-06-15 Amazon Technologies, Inc. Indexes on non-materialized views
US11119998B1 (en) 2018-11-26 2021-09-14 Amazon Technologies, Inc. Index and view updates in a ledger-based database
US11196567B2 (en) 2018-11-26 2021-12-07 Amazon Technologies, Inc. Cryptographic verification of database transactions
US11675770B1 (en) 2018-11-26 2023-06-13 Amazon Technologies, Inc. Journal queries of a ledger-based database

Similar Documents

Publication Publication Date Title
US20220405269A1 (en) Processing mutations for a remote database
US10713654B2 (en) Enterprise blockchains and transactional systems
US9639542B2 (en) Dynamic mapping of extensible datasets to relational database schemas
US9886443B1 (en) Distributed NFS metadata server
US20100131940A1 (en) Cloud based source code version control
US10417181B2 (en) Using location addressed storage as content addressed storage
JP2010211828A5 (en)
US9659021B1 (en) Client based backups and backup indexing
WO2015187187A1 (en) Journal events in a file system and a database
WO2016169322A1 (en) Query method and device for database, and computer storage medium
US20220391356A1 (en) Duplicate file management for content management systems and for migration to such systems
US9092338B1 (en) Multi-level caching event lookup
WO2016130167A1 (en) Consistency check on namespace of an online file system
US10242025B2 (en) Efficient differential techniques for metafiles
US11016933B2 (en) Handling weakening of hash functions by using epochs
WO2016118176A1 (en) Database management
US10185759B2 (en) Distinguishing event type
US11422733B2 (en) Incremental replication between foreign system dataset stores
WO2017007496A1 (en) Managing a database index file
WO2015178943A1 (en) Eliminating file duplication in a file system
WO2016195728A1 (en) Generating test data based on histogram statistics
WO2015134018A1 (en) Processing primary key modifications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14893943

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14893943

Country of ref document: EP

Kind code of ref document: A1