Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberCN102750354 A
Publication typeApplication
Application numberCN 201210190832
Publication date24 Oct 2012
Filing date11 Jun 2012
Priority date11 Jun 2012
Also published asCN102750354B
Publication number201210190832.5, CN 102750354 A, CN 102750354A, CN 201210190832, CN-A-102750354, CN102750354 A, CN102750354A, CN201210190832, CN201210190832.5
Inventors丁贵广, 卓安, 王建民, 黄向东
Applicant清华大学
Export CitationBiBTeX, EndNote, RefMan
External Links: SIPO, Espacenet
Method for analyzing and processing non-structured data query operating language
CN 102750354 A
Abstract
The invention relates to a method for analyzing and processing a non-structured data query operating language, which belongs to the technical field of management of computer data. According to the method for analyzing and processing a non-structured data query operating language provided by the invention, a structured query language is defined specific to the query of non-structured data, and the language is easy to extend and can be fused with customized query functions like the query language grammar of the conventional relation database. The method comprises the following steps of starting a query module in a key value library, receiving a query language request of a user, analyzing a language and converting into an internal command; calling each functional module in the key value library to execute by using the query module according to the internal command; and returning a result to the user after the command is executed. According to the method, the query module is taken as a core, and a key value library on a bottom layer is accessed in a way of designing a similar SQL (Structured Query Language), so that the user can operate the key value library easily and manage non-structured data.
Claims(1)  translated from Chinese
1. 一种非结构化数据管理查询语言的解析和处理方法,其特征在于该方法包括以下步骤: (1)启动键值库中的查询模块,查询模块监听用户的查询语言请求; (2)查询模块接收用户的的查询语言请求,对语言进行解析,解析步骤如下: (2-1)用户端采用查询语言驱动方式连接查询模块,建立用户端与查询模块之间的会话,并保存会话过程中的会话信息,访问查询模块,向查询模块发送查询语言; (2-2)通过查询模块中的解析器,查询模块将用户端发送的查询语言请求转换为内部命令; (3)对上述内部命令进行判断,若该内部命令为指定本次会话的键值库表的命令,则查询模块保存该指定键值库表的名字,并在后续的命令中默认本次会话在该键值库表下执行;若查询语言中的任意位置具有一个相似关键字,则查询模块将该内部命令转交给键值库中的索引调用模块;若查询语言中的任意位置具有一个函数关键字,则查询模块将该内部命令转交给键值库中的函数调用模块; (4)键值库中的查询模块根据内部命令,调用键值库中的各功能模块执行内部命令,具体过程如下: (4-1)若内部命令为结构化查询命令,则采用键值库中的服务器执行命令; (4-2)若内部命令为创建键值库索引命令,则采用键值库中的服务器执行命令; (4-3)若内部命令为创建非键值库索引命令,则构建一个索引实现库,并调用索引实现库执彳了命令; (4-4)若内部命令为运行数据函数分析命令,则构建一个数据函数分析模块,并调用数据函数分析模块执行命令,查询模块获取命令的执行状态和执行结果; (4-5)若内部命令为大数据传输,则使用独立的数据传输流等待与用户端连接,完成连接后,通过数据传输流进行文件传输;传输结束后,查询模块保存传输的文件,并保持用户端与查询模块之间的会话; (4-6)若内部命令是自定义创建索引、查询索引和建立函数,自定义创建索引和查询索引的执行命令,则通过一个关键字标明索引的创建参数和索引创建类型,完成索引的创建和查询;对于自定义建立函数的执行命令,查询模块根据查询语言中的函数关键字和函数的变长参数,从查询模块的配置文件中列出的函数支持类型中,选择相应的函数,完成函数的建立; (4-7)若内部命令为多种类型索引的联合查询,则查询模块对多种类型索引进行分拆,得到各个类型索引的查询子句,根据查询子句,读取查询模块的配置文件中不同索引查询的优先级,调整多个查询子句的查询顺序,进行查询; (5)查询模块向用户端返回查询结果。 CLAIMS 1. A method for parsing and processing unstructured data management query language, characterized in that the method comprises the steps of: (1) Start key database query module, a query module monitor the user's query language request; (2) Query module receives the user's query language requests for language parsing, parsing the following steps: (2-1) client-driven approach using query language query module is connected, the session is established between the client and the query module, and save session The session information, visit query module, send a query language to query module; (2-2) through the query parser module, the module will query language query the user sends a request to an internal order; (3) to the internal command to determine if the internal command for the specified key database table this session of the command, the query module saves the designated key database table names, and by default this session key in the database table in the subsequent command under execution; if the query language in any position has a similar keyword, the query module transferred to the internal command key index of the calling module library; if anywhere query language has a function key, the query module transferred to the internal command key library function call module; (4) the key repository query module based on the internal order, call the library key functional modules perform an internal command, the specific process is as follows: (4-1 ) If the internal commands for Structured Query command, the use of key library server executes commands; (4-2) If the internal command to create a key database index command, using the key database server execute the command; (4 -3) If the internal command for creating a non-key command library index, an index is constructed to achieve the library, and the library call index realize execute commands left foot; (4-4) If the internal command to run the data analysis function command is to build a Data function analysis module, and call data analysis function module execute the command, the query execution status of the command module and get the results; (4-5) If the internal command for large data transfers, using independent data transport stream and wait for the client to connect After completion of the connection, transfer files via data transmission stream; after the transfer, the query module saves transferred files, and keep the session between the client and the query module; (4-6) if the internal command is to create a custom index, query the index and the establishment of a function to create a custom index index and query execution command, create the type marked by a keyword index and index creation parameters, complete index creation and query; for the custom build command execution function, query module According to the query language keywords and variable-length argument function function, the function of the type of support from the configuration file lists the query module, select the appropriate function, complete the establishment of the function; (4-7) if the internal command for more federated query types of indexes, the query module for a variety of types of indexes split among all types of index query clauses, according to the clause of the query, the query module reads the configuration file index query different priorities, adjusting more query query order clause query; (5) the query module returns the query results to the end user.
Description  translated from Chinese

一种非结构化数据查询操作语言的解析与处理方法 Analysis and processing method of unstructured data query language

技术领域 FIELD

[0001] 本发明涉及一种非结构化数据查询操作语言的解析与处理方法,属于计算机数据管理技术领域。 [0001] The present invention relates to a method for parsing and processing of unstructured data query language, which belongs to the field of computer data management technology.

背景技术 BACKGROUND

[0002] 随着互联网等新兴应用的日益丰富以及企业信息化建设的不断发展,出现了大量的非结构化数据。 [0002] With the increasingly rich and the continuous development of enterprise information construction of the Internet and other emerging applications, there has been a large amount of unstructured data. 由于非结构化数据数据类型丰富,结构复杂,没有明确的、统一定义的数据结构约束,加之其海量的数据规模,高度动态的数据特性,多样的应用场景,统一的联合访问需求,使得非结构化数据管理面临巨大挑战。 Because of the rich data types of unstructured data, complex structure, there is no clear definition of a unified data structure constraints, coupled with the data size of its massive, highly dynamic data features, a variety of application scenarios, unified joint access requirements, making the non-structural data management faces enormous challenges.

[0003] 传统的关系数据库在处理海量的非结构化数据上难以提出有效的解决方案。 [0003] in a traditional relational database processing vast amounts of unstructured data is difficult to come up with effective solutions. 传统数据库的数据模型都是模式优先的逻辑结构,而非结构化数据则是模式滞后的逻辑结构,这使得建立在关系代数基础上的数据管理方法在解决非结构化数据的问题上不再有效。 Traditional database data model is the logical structure of the priority mode, and unstructured data is a lagging mode logical structure, which makes the establishment of relations based on the algebraic approach to data management in solving the problem of unstructured data is no longer valid . 非结构化数据的海量特性也使得传统数据库在性能和扩展性上无能为力。 Massive unstructured data characteristic also makes traditional databases do nothing in terms of performance and scalability.

[0004] 新兴的键值库以无模式的方式打破了传统数据库的模式优先逻辑,同时它以键值的方式保证了高速的读写。 [0004] key emerging pattern library with no way to break the pattern of traditional databases priority logic, while its key way to ensure high-speed read and write. 现在流行并发展迅速的键值库有!Base、MangoDB> Dynamo和Cassandra等等。 Now the popular and rapidly growing library has keys! Base, MangoDB> Dynamo and Cassandra and so on. 他们以分布式集群方式保证了海量数据的存储与扩展性,本发明正是基于这样的键值库。 They distributed cluster approach ensures scalability and massive data storage, the present invention is based on the key library.

[0005] 然而新兴的键值库并没有完善的查询方式和查询语言。 [0005] However, the emerging key repository and there is no perfect way to query and query languages. 如HBase提供了API访问, Cassandra提供了API与一种名为CQL的类SQL语言方式访问。 Such as HBase provides API access, Cassandra provides an API and a class called CQL SQL language access. 然而他们由于自身数据库的限制,仅能对非结构化数据进行简单的查询与更新,没有提供复杂的分析函数,也没有考虑大容量数据的语言描述方式。 However, due to limitations of their own database, only a simple unstructured data query and update, do not provide sophisticated analysis functions, but also did not consider the large volumes of data description language approach. CouchDB与SQLite两创始人联合在试图设计键值库的统一查询语言UnQL,然而目前也仅仅只有雏形,对于非结构化数据的多特征查询这一特点也没有有效考虑。 CouchDB and SQLite two founders of the joint in trying to design a unified query language UnQL key library, but now only just taking shape characteristic of unstructured data for multiple queries of this feature does not consider valid.

[0006] 从最终用户和应用的角度,非结构化数据查询语言应该解决以下问题: [0006] From the perspective of end users and applications, unstructured data query language should address the following issues:

[0007] (I)支持面向键值库存储的非结构化数据查询; [0007] (I) support for key-value store unstructured data queries;

[0008] 现有的非结构化数据多以存储在键值库中作为海量与高效读写的解决方案,而键值库往往没有提供易用的查询语言。 [0008] Existing mostly unstructured data stored in the key library to read and write as a massive and highly efficient solution, but often do not provide the key database query language to use.

[0009] (2)能有效解决不同非结构化数据的多种特征的统一查询; [0009] (2) can effectively solve the various features of different unified query unstructured data;

[0010] 现有的CQL等语言只提供简单的查询功能,无法对非结构化数据进行特征检索。 [0010] and other existing CQL query language provides only a simple function, can not be characterized by unstructured data retrieval. 比如对图像数据进行直方图、颜色等特征检索,对音频进行MFCC特征检索等等。 For example the image data histogram, color and other characteristics retrieval, retrieval of audio MFCC feature like.

[0011] (3)如何进行有效地数据查询与分析。 [0011] (3) how to effectively work with data query and analysis.

[0012] 传统数据查询仅仅实现索引和简单的统计函数。 [0012] Traditional data indexing and query only achieve simple statistical functions. 对于海量的非结构化数据而言,很多结果需要进行数据的分析得出,因此查询语言应该尽可能的支持更多的数据分析函数。 For the mass of unstructured data, the results need to be analyzed a lot of data to draw, so the query language should be possible to support more data analysis functions.

发明内容[0013] 本发明的目的是提出一种非结构化数据查询操作语言的解析与处理方法,针对非结构化数据管理领域存在的问题,用一种类似SQL语言的方式来访问底层的键值库,以达到让用户轻松操作键值库来管理非结构化数据的目的。 SUMMARY OF THE INVENTION [0013] The present invention is to propose a method of analysis and processing of unstructured data query language for unstructured data management problems in a similar way to access SQL language underlying bond the value of the library, allowing users to easily operate in order to achieve the key purpose of the library to manage unstructured data.

[0014] 本发明提出的非结构化数据管理查询语言的解析和处理方法,包括以下步骤: [0014] parsing and processing method of the present invention proposes the management of unstructured data query language, comprising the steps of:

[0015] (I)启动键值库中的查询模块,查询模块监听用户的查询语言请求; [0015] (I) start key database query module, query module monitor the user's query language request;

[0016] (2)查询模块接收用户的的查询语言请求,对语言进行解析,解析步骤如下: [0017] (2-1)用户端采用查询语言驱动方式连接查询模块,建立用户端与查询模块之间的会话,并保存会话过程中的会话信息,访问查询模块,向查询模块发送查询语言; [0016] (2) a query module receives a user's query language requests, language parsing, analysis step as follows: [0017] (2-1) the UE uses query language query module drivingly connected to establish client query module between sessions, and save session information during the session, visit the query module, send a query language to query module;

[0018] (2-2)通过查询模块中的解析器,查询模块将用户端发送的查询语言请求转换为内部命令; [0018] (2-2) through the query parser module, the module will query language query the user sends a request to an internal command;

[0019] (3)对上述内部命令进行判断,若该内部命令为指定本次会话的键值库表的命令,则查询模块保存该指定键值库表的名字,并在后续的命令中默认本次会话在该键值库表下执行;若查询语言中的任意位置具有一个相似关键字,则查询模块将该内部命令转交给键值库中的索引调用模块;若查询语言中的任意位置具有一个函数关键字,则查询模块将该内部命令转交给键值库中的函数调用模块; [0019] (3) internal command above judgment, if the internal command for the specified key database table this session of the command, the query module saves the designated key database table names, and default in subsequent command this session key to perform at the library table; if anywhere query language has a similar keyword, the query module transferred to the internal command key repository index calling module; anywhere if the query language has a function key, the query module transferred to the internal command key library function call module;

[0020] (4)键值库中的查询模块根据内部命令,调用键值库中的各功能模块执行内部命令,具体过程如下: [0020] (4) key database query module based on the internal order, call the library key functional modules perform an internal command, the specific process is as follows:

[0021] (4-1)若内部命令为结构化查询命令,则采用键值库中的服务器执行命令; [0021] (4-1) If the internal commands for Structured Query command, using the key database server execute the command;

[0022] (4-2)若内部命令为创建键值库索引命令,则采用键值库中的服务器执行命令; [0022] (4-2) If the internal command to create a key database index command, using the key database server execute the command;

[0023] (4-3)若内部命令为创建非键值库索引命令,则构建一个索引实现库,并调用索引实现库执行命令; [0023] (4-3) If the internal command to create a non-key command library index, an index is constructed to achieve the library, and the library call index realize execute the command;

[0024] (4-4)若内部命令为运行数据函数分析命令,则构建一个数据函数分析模块,并调用数据函数分析模块执行命令,查询模块获取命令的执行状态和执行结果; [0024] (4-4) If the internal command to run the data analysis function command, build a data analysis function module, and call data analysis module performs the function command, query module acquisition command execution status and execution results;

[0025] (4-5)若内部命令为大数据传输,则使用独立的数据传输流等待与用户端连接,完成连接后,通过数据传输流进行文件传输;传输结束后,查询模块保存传输的文件,并保持用户端与查询模块之间的会话; [0025] (4-5) If the internal command for large data transfers, using independent data transport stream waiting for a connection with the client, after the completion of the connection, the data transmission stream for file transfer; After the transfer, the query module saves transmission file, and keep the session between the client and query module;

[0026] (4-6)若内部命令是自定义创建索引、查询索引和建立函数,自定义创建索引和查询索引的执行命令,则通过一个关键字标明索引的创建参数和索引创建类型,完成索引的创建和查询;对于自定义建立函数的执行命令,查询模块根据查询语言中的函数关键字和函数的变长参数,从查询模块的配置文件中列出的函数支持类型中,选择相应的函数,完成函数的建立; [0026] (4-6) If the internal order is custom-created index, query indexing and building functions, create custom indexing and query execution index command is marked by a keyword index creation parameters and index creation type, complete index creation and query; for custom build execute the command function, query module based on variable-length parameter query language function key and function, the function listed in the configuration file from the query module supports Type, select the appropriate function, complete the establishment of the function;

[0027] (4-7)若内部命令为多种类型索引的联合查询,则查询模块对多种类型索引进行分拆,得到各个类型索引的查询子句,根据查询子句,读取查询模块的配置文件中不同索引查询的优先级,调整多个查询子句的查询顺序,进行查询; [0027] (4-7) If the internal commands for various types of joint inquiry index, the query module for a variety of types of indexes split among all types of index query clauses, according to the clause of the query, the query module reads The profile index query different priority, adjust the order of multiple query query clause query;

[0028] (5)查询模块向用户端返回查询结果。 [0028] (5) Query module returns the query results to the client.

[0029] 本发明提出的非结构化数据管理查询语言的解析和处理方法,针对非结构化数据的查询,定义了结构化的查询语言,与传统关系数据库的查询语言语法类似,该语言易扩展并可融合自定义的查询函数。 [0029] The present invention provides a method for parsing and processing of unstructured data management, query language for query unstructured data, define a structured query language, with traditional relational database query language syntax is similar to the language easy to expand and integration of custom query functions. 本发明方法的核心是查询模块,通过设计接口使查询模块与键值库松耦合,可以方便的将现有键值库的查询模块移植到其他键值库中;本发明方法提供了多种包括自定义在内的特征检索,因此可以直接管理多种非结构化数据;本发明方法可以支持大数据(如文件)的读写操作,提供支持数据分析等分布式函数的执行操作和可以配置的查询优先级设置等特点,保证高效的管理非结构化数据。 The core of the method of the invention is to query module, through the design of the interface to make inquiries with key library loosely coupled modules can easily be existing key database query module ported to other key library; methods of the present invention provides a variety of included Custom features including search, so you can directly manage a variety of unstructured data; the method of the invention can support large data (such as files) read and write operations, support distributed data analysis functions and can be configured to perform operations Query priority setting, etc., to ensure efficient management of unstructured data.

具体实施方式 DETAILED DESCRIPTION

[0030] 本发明提出的非结构化数据管理查询语言的解析和处理方法,包括以下步骤: [0030] The parsing and processing method of the present invention proposes the management of unstructured data query language, comprising the steps of:

[0031] (I)启动键值库中的查询模块,查询模块监听用户的查询语言请求; [0031] (I) start key database query module, query module monitor the user's query language request;

[0032] (2)查询模块接收用户的的查询语言请求,对语言进行解析,解析步骤如下: [0032] (2) a query module receives a user's query language requests, language parsing, analysis step as follows:

[0033] (2-1)用户端采用查询语言驱动方式连接查询模块,建立用户端与查询模块之间的会话,并保存会话过程中的会话信息,访问查询模块,向查询模块发送查询语言; [0033] (2-1) client-driven approach using query language query module is connected, the session is established between the client and the query module, and saved during the session session information, visit the query module, send a query language to query module;

[0034] (2-2)通过查询模块中的解析器,查询模块将用户端发送的查询语言请求转换为内部命令; [0034] (2-2) through the query parser module, the module will query language query the user sends a request to an internal command;

[0035] (3)对上述内部命令进行判断,若该内部命令为指定本次会话的键值库表的命令,则查询模块保存该指定键值库表的名字,并在后续的命令中默认本次会话在该键值库表下执行;若查询语言中的任意位置具有一个相似(like)关键字,则查询模块将该内部命令转交给键值库中的索引调用模块;若查询语言中的任意位置具有一个函数(function)关键字,则查询模块将该内部命令转交给键值库中的函数调用模块; [0035] (3) internal command above judgment, if the internal command for the specified key database table this session of the command, the query module saves the designated key database table names, and default in subsequent command this session key to perform at the library table; if anywhere query language has a similar (like) the keyword, the query module transferred to the internal command key index of the calling module library; if the query language anywhere has a function (function) key, the query module is transferred to the internal command key library function call module;

[0036] (4)键值库中的查询模块根据内部命令,调用键值库中的各功能模块执行内部命令,具体过程如下: [0036] (4) key database query module based on the internal order, call the library key functional modules perform an internal command, the specific process is as follows:

[0037] (4-1)若内部命令为结构化查询命令,如创建表、创建列族或在列族中添加删除数据,则采用键值库中的服务器执行命令; [0037] (4-1) If the internal command for structured query commands, such as creating tables, create column family or add and delete data in a column family, then the use of key library server executes commands;

[0038] (4-2)若内部命令为创建键值库索引命令,则采用键值库中的服务器执行命令; [0038] (4-2) If the internal command to create a key database index command, using the key database server execute the command;

[0039] (4-3)若内部命令为创建非键值库索引命令,如图片的高维索引、文本的全文索弓I,则构建一个索引实现库,并调用索引实现库执行命令; [0039] (4-3) If the internal command to create a non-key library index command, such as a picture of the high-dimensional indexing, full text search bow I, is to build an index to achieve the library, and the library call index realize execute the command;

[0040] (4-4)若内部命令为运行数据函数分析命令,则构建一个数据函数分析模块,并调用数据函数分析模块执行命令,查询模块获取命令的执行状态和执行结果; [0040] (4-4) If the internal command to run the data analysis function command, build a data analysis function module, and call data analysis module performs the function command, query module acquisition command execution status and execution results;

[0041] (4-5)若内部命令为大数据传输,则使用独立的数据传输流等待与用户端连接,完成连接后,通过数据传输流进行文件传输;传输结束后,查询模块保存传输的文件,并保持用户端与查询模块之间的会话; [0041] (4-5) If the internal command for large data transfers, using independent data transport stream waiting for a connection with the client, after the completion of the connection, the data transmission stream for file transfer; After the transfer, the query module saves transmission file, and keep the session between the client and query module;

[0042] (4-6)若内部命令是自定义创建索引、查询索引和建立函数,本发明提出的查询语言通过半开放式关键字设置达到多种索引创建与查询、多种函数支持的效果;对于自定义创建索引和查询索引的执行命令,则通过一个关键字(例如with)标明索引的创建参数和索引创建类型,完成索引的创建和查询;对于自定义建立函数的执行命令,查询模块根据查询语言中的函数关键字和函数的变长参数,从查询模块的配置文件中列出的函数支持类型中,选择相应的函数,完成函数的建立; [0042] (4-6) If the internal command is to create a custom index, query indexing and building functions, query language proposed by the invention to create a variety of indexing and query reached by a semi-open keyword is set, the effect of various functions support ; For custom create the index and query execution command index is created by a keyword (eg with) indicate the index creation parameters and index types, complete index creation and query; for the custom build command execution function, query module According to the query language keywords and variable-length argument function function, the function of the type of support from the configuration file lists the query module, select the appropriate function, complete the establishment of the function;

[0043] (4-7)若内部命令为多种类型索引的联合查询,在较为复杂的查询语句中,会同时存在键值库默认索引查询(列值或者键值的过滤)、多个自定义索引查询的联合查询;则查询模块对多种类型索引进行分拆,得到各个类型索引的查询子句,根据查询子句,读取查询模块的配置文件中不同索引查询的优先级,调整多个查询子句的查询顺序,进行查询;[0044] (5)查询模块向用户端返回查询结果。 [0043] (4-7) for a variety of types of internal order if the index joint inquiry, the more complex the query, while there will be a default key database index query (column values or keys filter), a plurality of self union query definition index query; the query module for a variety of types of indexes split among all types of index query clauses, according to the clause of the query, the query module reads the configuration file index query different priorities, adjusting more query query order clause query; [0044] (5) Query module returns the query results to the end user.

Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
CN102129469A *23 Mar 201120 Jul 2011华中科技大学Virtual experiment-oriented unstructured data accessing method
CN102298641A *14 Sep 201128 Dec 2011清华大学一种基于键值库的文件与结构化数据统一存储方法
US7194483 *19 Mar 200320 Mar 2007Intelligenxia, Inc.Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information
US20080201290 *16 Feb 200721 Aug 2008International Business Machines CorporationComputer-implemented methods, systems, and computer program products for enhanced batch mode processing of a relational database
Non-Patent Citations
Reference
1 *田万鹏 等: "一种基于特征的非结构化数据演化管理建模框架", 《计算机研究与发展》, no. 47, 31 December 2010 (2010-12-31), pages 394 - 399
Referenced by
Citing PatentFiling datePublication dateApplicantTitle
CN103425779A *19 Aug 20134 Dec 2013曙光信息产业股份有限公司Data processing method and data processing device
Classifications
International ClassificationG06F17/30
Legal Events
DateCodeEventDescription
24 Oct 2012C06Publication
19 Dec 2012C10Entry into substantive examination
20 Aug 2014C14Grant of patent or utility model