Search Images Maps Play YouTube News Gmail Drive More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberCN102750354 B
Publication typeGrant
Application numberCN 201210190832
Publication date20 Aug 2014
Filing date11 Jun 2012
Priority date11 Jun 2012
Also published asCN102750354A
Publication number201210190832.5, CN 102750354 B, CN 102750354B, CN 201210190832, CN-B-102750354, CN102750354 B, CN102750354B, CN201210190832, CN201210190832.5
Inventors王建民, 丁贵广, 卓安, 黄向东
Applicant清华大学
Export CitationBiBTeX, EndNote, RefMan
External Links: SIPO, Espacenet
Method for analyzing and processing non-structured data query operating language
CN 102750354 B
Abstract  translated from Chinese
本发明涉及一种非结构化数据管理查询语言的解析和处理方法,属于计算机数据管理技术领域。 The present invention relates to a method for parsing and processing of unstructured data management, query language, which belongs to computer data management technology. 本发明提出的非结构化数据管理查询语言的解析和处理方法,针对非结构化数据的查询,定义了结构化的查询语言,与传统关系数据库的查询语言语法类似,该语言易扩展并可融合自定义的查询函数。 The present invention provides a method of analysis and processing of unstructured data management query language for query unstructured data, define a structured query language, similar to a traditional relational database query language syntax, the language is extensible and can be integrated custom query functions. 本方法首先启动键值库中的查询模块,接收用户的查询语言请求,对语言进行解析并转换为内部命令;查询模块根据内部命令调用键值库中各功能模块去执行;命令执行完毕向用户返回结果。 This method first start key database query module receives a user's query language request language parsed and converted into an internal order; key database query module calls the function module to execute the internal command; command is complete to the user return results. 本发明方法的核心是查询模块,通过设计一种类似SQL语言的方式来访问底层的键值库,使用户通过轻松操作键值库,管理非结构化数据。 The core of the inventive method is to query the module to access the underlying SQL language by designing a similar manner as the key database, allowing users to easily operate through the key-value library, managing unstructured data.
Claims(1)  translated from Chinese
1.一种非结构化数据管理查询语言的解析和处理方法,其特征在于该方法包括以下步骤: (1)启动键值库中的查询模块,查询模块监听用户的查询语言请求; (2)查询模块接收用户的查询语言请求,对语言进行解析,解析步骤如下: (2-1)用户端采用查询语言驱动方式连接查询模块,建立用户端与查询模块之间的会话,并保存会话过程中的会话信息,访问查询模块,向查询模块发送查询语言; (2-2)通过查询模块中的解析器,查询模块将用户端发送的查询语言请求转换为内部命令; (3)对上述内部命令进行判断,若该内部命令为指定本次会话的键值库表的命令,则查询模块保存该指定键值库表的名字,并在后续的命令中默认本次会话在该键值库表下执行;若查询语言中的任意位置具有一个相似关键字,则查询模块将该内部命令转交给键值库中的索引调用模块;若查询语言中的任意位置具有一个函数关键字,则查询模块将该内部命令转交给键值库中的函数调用模块; (4)键值库中的查询模块根据内部命令,调用键值库中的各功能模块执行内部命令,具体过程如下: (4-1)若内部命令为结构化查询命令,则采用键值库中的服务器执行命令; (4-2)若内部命令为创建键值库索引命令,则采用键值库中的服务器执行命令; (4-3)若内部命令为创建非键值库索引命令,则构建一个索引实现库,并调用索引实现库执彳了命令; (4-4)若内部命令为运行数据函数分析命令,则构建一个数据函数分析模块,并调用数据函数分析模块执行命令,查询模块获取命令的执行状态和执行结果; (4-5)若内部命令为大数据传输,则使用独立的数据传输流等待与用户端连接,完成连接后,通过数据传输流进行文件传输;传输结束后,查询模块保存传输的文件,并保持用户端与查询模块之间的会话; (4-6)若内部命令是自定义创建索引执行命令、查询索引执行命令和建立函数执行命令,则其中的自定义创建索引执行命令和查询索引执行命令,通过一个关键字标明索引的创建参数和索引创建类型,完成索引的创建和查询,其中的自定义建立函数的执行命令,查询模块根据查询语言中的函数关键字和函数的变长参数,从查询模块的配置文件中列出的函数支持类型中,选择相应的函数,完成函数的建立; (4-7)若内部命令为多种类型索引的联合查询,则查询模块对多种类型索引进行分拆,得到各个类型索引的查询子句,根据查询子句,读取查询模块的配置文件中不同索引查询的优先级,调整多个查询子句的查询顺序,进行查询; (5)查询模块向用户端返回查询结果。 1. A method for parsing and processing of unstructured data management query language, characterized in that the method comprises the following steps: (1) Start key database query module, query module monitor the user's query language request; (2) Query module receives a user's query language request language parsing, parsing as follows: (2-1) client drive connected using query language query module session is established between the client and the query module, and saved during the session session information, visit query module, send a query language to query module; (2-2) by a query parser module, the module will query language queries sent by the client request to an internal order; (3) to the internal command judge, if the internal command specified key database table this session of the command, the query module saves the designated key database table names, and by default this session in the subsequent command in the key database table execution; if the query language in any position has a similar keyword, the query module transferred to the internal command key index calling module library; if anywhere query language has a function key, the query module The command is transmitted to the key internal library function call module; (4) the key database query module according to an internal command to invoke the key functional modules in the library performs an internal command procedure is as follows: (4-1) If the internal command structure of the query, the database server uses the key orders; (4-2) If the internal command to create a key repository index command, then use the key database server orders; (4 3) If the internal command to create a non-key vault index command, then build an index to achieve the library, and the library call index realize execute commands left foot; (4-4) If the internal command function to run data analysis command, build a data function analysis module, and call data analysis function module execute command to query the status of the command module gets executed and the execution result; (4-5) If the internal command of large data transfers, use a separate data stream waits for a connection with the client, Once connected, via a data transmission stream for file transfer; After transfer, query module to save the transferred files, and keep the conversation between the client and the query module; (4-6) If the internal order is custom-created index Run , query the index Run Run and build function, then one of the custom-created index and query the index Run Run, create the type indicated by a keyword index and index creation parameters, complete index creation and query them from Defining Setup Run function, query module based on the query language variable length parameter of the function keys and function, the function of the type of support from the configuration file lists the query module, select the appropriate function, complete the establishment of the function; ( 4-7) If the internal commands for a variety of types of indexed federated query, the query module for a variety of types of indexes split to give each type of query clauses index, according to the query clause query module reads the configuration file priority, adjust the order of a plurality of Enquiry clause query different index query; and (5) query module side return query results to the user.
Description  translated from Chinese

一种非结构化数据查询操作语言的解析与处理方法 Analysis and processing method of unstructured data query language

技术领域 Technical Field

[0001] 本发明涉及一种非结构化数据查询操作语言的解析与处理方法,属于计算机数据管理技术领域。 [0001] The present invention relates to a method of analysis and processing of unstructured data query language, which belongs to computer data management technology.

背景技术 Background

[0002] 随着互联网等新兴应用的日益丰富以及企业信息化建设的不断发展,出现了大量的非结构化数据。 [0002] With the increasingly rich and the continuous development of enterprise information construction of the Internet and other emerging applications, there has been a lot of unstructured data. 由于非结构化数据数据类型丰富,结构复杂,没有明确的、统一定义的数据结构约束,加之其海量的数据规模,高度动态的数据特性,多样的应用场景,统一的联合访问需求,使得非结构化数据管理面临巨大挑战。 Because rich unstructured data types, complex structure, there is no clear, unified data structure defined constraints, coupled with the data size of its massive, highly dynamic data features, a variety of application scenarios, unified joint access requirements, making non-structural data management faces enormous challenges.

[0003] 传统的关系数据库在处理海量的非结构化数据上难以提出有效的解决方案。 [0003] The traditional relational database on a data processing vast amounts of unstructured difficult to propose effective solutions. 传统数据库的数据模型都是模式优先的逻辑结构,而非结构化数据则是模式滞后的逻辑结构,这使得建立在关系代数基础上的数据管理方法在解决非结构化数据的问题上不再有效。 Traditional database data models are priority mode logical structure and unstructured data is a lagging mode logical structure, which makes the establishment of relations based on the algebraic approach to data management in addressing the problem of unstructured data is no longer valid . 非结构化数据的海量特性也使得传统数据库在性能和扩展性上无能为力。 Massive unstructured data characteristic also makes traditional database powerless in performance and scalability.

[0004] 新兴的键值库以无模式的方式打破了传统数据库的模式优先逻辑,同时它以键值的方式保证了高速的读写。 [0004] The new library with no mode key way to break the pattern of traditional database priority logic, while its key way to ensure the high-speed read and write. 现在流行并发展迅速的键值库有HBase、MangoDB> Dynamo和Cassandra等等。 Now the popular and fast growing key library has HBase, MangoDB> Dynamo and Cassandra and so on. 他们以分布式集群方式保证了海量数据的存储与扩展性,本发明正是基于这样的键值库。 They distributed clusters way to ensure the massive data storage and expandability, the present invention is based on this key library.

[0005] 然而新兴的键值库并没有完善的查询方式和查询语言。 [0005] However, the emerging key repository and there is no perfect ways to search and query language. 如HBase提供了API访问,Cassandra提供了API与一种名为CQL的类SQL语言方式访问。 Such as HBase provides API access, Cassandra provides an API and a class called CQL access SQL language. 然而他们由于自身数据库的限制,仅能对非结构化数据进行简单的查询与更新,没有提供复杂的分析函数,也没有考虑大容量数据的语言描述方式。 However, due to limitations of their own database, only a simple unstructured data query and update, do not provide sophisticated analysis functions, there is no consideration of language to describe the large-capacity data mode. CouchDB与SQLite两创始人联合在试图设计键值库的统一查询语言UnQL,然而目前也仅仅只有雏形,对于非结构化数据的多特征查询这一特点也没有有效考虑。 CouchDB and SQLite two founders of the joint in trying to design a unified database query language UnQL key, but currently only a prototype only, for multi-feature unstructured data query This feature does not consider valid.

[0006] 从最终用户和应用的角度,非结构化数据查询语言应该解决以下问题: [0006] From the perspective of end-users and applications, unstructured data query language should address the following issues:

[0007] (I)支持面向键值库存储的非结构化数据查询; [0007] (I) support for key-value store unstructured data queries;

[0008] 现有的非结构化数据多以存储在键值库中作为海量与高效读写的解决方案,而键值库往往没有提供易用的查询语言。 [0008] Multi-existing unstructured data to be stored in the key database as a massive and efficient solution to read and write, but often do not provide easy to use key-value database query language.

[0009] (2)能有效解决不同非结构化数据的多种特征的统一查询; [0009] (2) can effectively solve the various features of different unified query unstructured data;

[0010] 现有的CQL等语言只提供简单的查询功能,无法对非结构化数据进行特征检索。 [0010] Existing languages such as CQL only a simple search function, you can not retrieve unstructured data features. 比如对图像数据进行直方图、颜色等特征检索,对音频进行MFCC特征检索等等。 For example, a histogram of the image data, color and other characteristics retrieval of audio MFCC feature retrieval and so on.

[0011] (3)如何进行有效地数据查询与分析。 [0011] (3) how effectively data query and analysis.

[0012] 传统数据查询仅仅实现索引和简单的统计函数。 [0012] The traditional data indexing and query only implement simple statistical functions. 对于海量的非结构化数据而言,很多结果需要进行数据的分析得出,因此查询语言应该尽可能的支持更多的数据分析函数。 For the mass of unstructured data, the results need to be analyzed a lot of data obtained, so the query language should be possible to support more data analysis functions.

发明内容[0013] 本发明的目的是提出一种非结构化数据查询操作语言的解析与处理方法,针对非结构化数据管理领域存在的问题,用一种类似SQL语言的方式来访问底层的键值库,以达到让用户轻松操作键值库来管理非结构化数据的目的。 SUMMARY [0013] The object of the present invention is to provide a method of analysis and processing of unstructured data query language for unstructured data management problems in a similar way to access SQL language underlying bond the value of the library, allowing users to easily operate to achieve the key aim library to manage unstructured data.

[0014] 本发明提出的非结构化数据管理查询语言的解析和处理方法,包括以下步骤: [0014] parsing and processing method of the present invention proposes to manage unstructured data query language, comprising the steps of:

[0015] (I)启动键值库中的查询模块,查询模块监听用户的查询语言请求; [0015] (I) start key database query module, query module monitor the user's query language request;

[0016] (2)查询模块接收用户的的查询语言请求,对语言进行解析,解析步骤如下: [0016] (2) query module receives a user's query language request language parsing, parsing the following steps:

[0017] (2-1)用户端采用查询语言驱动方式连接查询模块,建立用户端与查询模块之间的会话,并保存会话过程中的会话信息,访问查询模块,向查询模块发送查询语言; [0017] (2-1) client drive connected using query language query module session is established between the client and the query module, and saved during the session session information, visit inquiry module, send a query language to query module;

[0018] (2-2)通过查询模块中的解析器,查询模块将用户端发送的查询语言请求转换为内部命令; [0018] (2-2) through the query parser module, the module will query language queries sent by the client request to an internal order;

[0019] (3)对上述内部命令进行判断,若该内部命令为指定本次会话的键值库表的命令,则查询模块保存该指定键值库表的名字,并在后续的命令中默认本次会话在该键值库表下执行;若查询语言中的任意位置具有一个相似关键字,则查询模块将该内部命令转交给键值库中的索引调用模块;若查询语言中的任意位置具有一个函数关键字,则查询模块将该内部命令转交给键值库中的函数调用模块; [0019] (3) of the above-mentioned internal order to judge if the internal command to specify the key-value library table this session of the command, the query module saves the designated key database table names, and by default in the subsequent command The conversation carried out under the key-value library table; if the query language anywhere has a similar keyword, the query module transferred to the internal command key index calling module library; anywhere if the query language has a function key, the query module transferred to the internal command key library function call module;

[0020] (4)键值库中的查询模块根据内部命令,调用键值库中的各功能模块执行内部命令,具体过程如下: [0020] (4) key database query module according to an internal command to invoke the key functional modules in the library performs an internal command procedure is as follows:

[0021] (4-1)若内部命令为结构化查询命令,则采用键值库中的服务器执行命令; [0021] (4-1) If the internal command structure of the query, the database server uses the key orders;

[0022] (4-2)若内部命令为创建键值库索引命令,则采用键值库中的服务器执行命令; [0022] (4-2) If the internal command to create a key repository index command, then use the key repository server executes the command;

[0023] (4-3)若内部命令为创建非键值库索引命令,则构建一个索引实现库,并调用索引实现库执行命令; [0023] (4-3) If the internal command to create a non-key vault index command, then build an index to achieve the library, and the library call index realize Run;

[0024] (4-4)若内部命令为运行数据函数分析命令,则构建一个数据函数分析模块,并调用数据函数分析模块执行命令,查询模块获取命令的执行状态和执行结果; [0024] (4-4) If the internal command function to run data analysis command, build a data analysis function module, and call data analysis function module executes the command, query module acquisition command execution status and execution results;

[0025] (4-5)若内部命令为大数据传输,则使用独立的数据传输流等待与用户端连接,完成连接后,通过数据传输流进行文件传输;传输结束后,查询模块保存传输的文件,并保持用户端与查询模块之间的会话; [0025] (4-5) If the internal command of large data transfers, use a separate data stream waits for a connection with the client, after the completion of the connection, the data transmission stream for file transfer; After the transfer, the query module saves transmission document and maintain the session between the client and query module;

[0026] (4-6)若内部命令是自定义创建索引、查询索引和建立函数,自定义创建索引和查询索引的执行命令,则通过一个关键字标明索引的创建参数和索引创建类型,完成索引的创建和查询;对于自定义建立函数的执行命令,查询模块根据查询语言中的函数关键字和函数的变长参数,从查询模块的配置文件中列出的函数支持类型中,选择相应的函数,完成函数的建立; [0026] (4-6) If the internal order is custom-created index, query the index and build functions, custom indexing and query the index creation execution command is indicated by a keyword index creation parameters and index creation type, complete index creation and query; for the custom build command execution function, query module based on variable-length parameter query language function key and function, the function lists from the configuration file query module support type, choose the appropriate function, complete the establishment of the function;

[0027] (4-7)若内部命令为多种类型索引的联合查询,则查询模块对多种类型索引进行分拆,得到各个类型索引的查询子句,根据查询子句,读取查询模块的配置文件中不同索引查询的优先级,调整多个查询子句的查询顺序,进行查询; [0027] (4-7) If the internal commands for a variety of types of indexed federated query, the query module for a variety of types of indexes split to give each type of query clauses index, according to the query clause reads query module The profile index query different priorities, adjust the order of multiple query query clause to query;

[0028] (5)查询模块向用户端返回查询结果。 [0028] (5) Query module returns the query results to the client.

[0029] 本发明提出的非结构化数据管理查询语言的解析和处理方法,针对非结构化数据的查询,定义了结构化的查询语言,与传统关系数据库的查询语言语法类似,该语言易扩展并可融合自定义的查询函数。 [0029] The present invention provides a method of analysis and processing of unstructured data management, query language for query unstructured data, define a structured query language, with traditional relational database query language syntax is similar to the language easy to expand and integration of custom query functions. 本发明方法的核心是查询模块,通过设计接口使查询模块与键值库松耦合,可以方便的将现有键值库的查询模块移植到其他键值库中;本发明方法提供了多种包括自定义在内的特征检索,因此可以直接管理多种非结构化数据;本发明方法可以支持大数据(如文件)的读写操作,提供支持数据分析等分布式函数的执行操作和可以配置的查询优先级设置等特点,保证高效的管理非结构化数据。 The core of the inventive method is to query the module, through the design of the interface make the query module library loosely coupled with the keys, can easily be existing key database query module ported to other key library; the method of the present invention provides a variety include Custom features including search, so you can directly manage a variety of unstructured data; the method of the present invention can support large data (such as a file) read and write operations, support distributed data analysis functions and can be configured to perform operations Discover priority setting, etc., to ensure efficient management of unstructured data.

具体实施方式 DETAILED DESCRIPTION

[0030] 本发明提出的非结构化数据管理查询语言的解析和处理方法,包括以下步骤: [0030] The parsing and processing method of the present invention proposes to manage unstructured data query language, comprising the steps of:

[0031] (I)启动键值库中的查询模块,查询模块监听用户的查询语言请求; [0031] (I) start key database query module, query module monitor the user's query language request;

[0032] (2)查询模块接收用户的的查询语言请求,对语言进行解析,解析步骤如下: [0032] (2) query module receives a user's query language request language parsing, parsing the following steps:

[0033] (2-1)用户端采用查询语言驱动方式连接查询模块,建立用户端与查询模块之间的会话,并保存会话过程中的会话信息,访问查询模块,向查询模块发送查询语言; [0033] (2-1) client drive connected using query language query module session is established between the client and the query module, and saved during the session session information, visit inquiry module, send a query language to query module;

[0034] (2-2)通过查询模块中的解析器,查询模块将用户端发送的查询语言请求转换为内部命令; [0034] (2-2) through the query parser module, the module will query language queries sent by the client request to an internal order;

[0035] (3)对上述内部命令进行判断,若该内部命令为指定本次会话的键值库表的命令,则查询模块保存该指定键值库表的名字,并在后续的命令中默认本次会话在该键值库表下执行;若查询语言中的任意位置具有一个相似(like)关键字,则查询模块将该内部命令转交给键值库中的索引调用模块;若查询语言中的任意位置具有一个函数(function)关键字,则查询模块将该内部命令转交给键值库中的函数调用模块; [0035] (3) of the above-mentioned internal order to judge if the internal command to specify the key-value library table this session of the command, the query module saves the designated key database table names, and by default in the subsequent command The conversation carried out under the key-value library table; if the query language anywhere in having a similar (like) the keyword, the query module transferred to the internal command key index calling module library; if the query language anywhere has a function (function) key, the query module order handed the keys inside the library function call module;

[0036] (4)键值库中的查询模块根据内部命令,调用键值库中的各功能模块执行内部命令,具体过程如下: [0036] (4) key database query module according to an internal command to invoke the key functional modules in the library performs an internal command procedure is as follows:

[0037] (4-1)若内部命令为结构化查询命令,如创建表、创建列族或在列族中添加删除数据,则采用键值库中的服务器执行命令; [0037] (4-1) If the internal command structure of the query commands, such as creating tables, create column families or add and delete data in the column families, you use the key repository server executes the command;

[0038] (4-2)若内部命令为创建键值库索引命令,则采用键值库中的服务器执行命令; [0038] (4-2) If the internal command to create a key repository index command, then use the key repository server executes the command;

[0039] (4-3)若内部命令为创建非键值库索引命令,如图片的高维索引、文本的全文索弓I,则构建一个索引实现库,并调用索引实现库执行命令; [0039] (4-3) If the internal command to create a non-key vault index command, such as a picture of the high-dimensional indexing, full text search bow I, is to build an index to achieve library and call index implementation library Run;

[0040] (4-4)若内部命令为运行数据函数分析命令,则构建一个数据函数分析模块,并调用数据函数分析模块执行命令,查询模块获取命令的执行状态和执行结果; [0040] (4-4) If the internal command function to run data analysis command, build a data analysis function module, and call data analysis function module executes the command, query module acquisition command execution status and execution results;

[0041] (4-5)若内部命令为大数据传输,则使用独立的数据传输流等待与用户端连接,完成连接后,通过数据传输流进行文件传输;传输结束后,查询模块保存传输的文件,并保持用户端与查询模块之间的会话; [0041] (4-5) If the internal command of large data transfers, use a separate data stream waits for a connection with the client, after the completion of the connection, the data transmission stream for file transfer; After the transfer, the query module saves transmission document and maintain the session between the client and query module;

[0042] (4-6)若内部命令是自定义创建索引、查询索引和建立函数,本发明提出的查询语言通过半开放式关键字设置达到多种索引创建与查询、多种函数支持的效果;对于自定义创建索引和查询索引的执行命令,则通过一个关键字(例如with)标明索引的创建参数和索引创建类型,完成索引的创建和查询;对于自定义建立函数的执行命令,查询模块根据查询语言中的函数关键字和函数的变长参数,从查询模块的配置文件中列出的函数支持类型中,选择相应的函数,完成函数的建立; [0042] (4-6) If the internal command is to create a custom index, query the index and build functions, query language proposed by the invention to achieve a variety of index creation and query by a semi-open keyword is set, the effect of a variety of support ; For custom indexing and query the index creation execution command is created via a keyword (eg with) indicate the index creation parameters and index types, complete index creation and query; for custom build function execution command, query module based on the query language variable length parameter of the function keys and function, the function of the type of support from the configuration file listed in the query module, select the appropriate function, complete the establishment of the function;

[0043] (4-7)若内部命令为多种类型索引的联合查询,在较为复杂的查询语句中,会同时存在键值库默认索引查询(列值或者键值的过滤)、多个自定义索引查询的联合查询;则查询模块对多种类型索引进行分拆,得到各个类型索引的查询子句,根据查询子句,读取查询模块的配置文件中不同索引查询的优先级,调整多个查询子句的查询顺序,进行查询;[0044] (5)查询模块向用户端返回查询结果。 [0043] (4-7) If the internal commands for a variety of types of index of the joint inquiry, the more complex the query, while there will be a default index key database queries (column values or keys filter), a plurality of self United query definition index query; the query module for a variety of types of indexes split to give each type of query clauses index, according to the query clause query module reads the configuration file index query different priorities, adjusting more query query order clause query; [0044] (5) query module side return query results to the user.

Patent Citations
Cited PatentFiling datePublication dateApplicantTitle
CN102129469A23 Mar 201120 Jul 2011华中科技大学Virtual experiment-oriented unstructured data accessing method
CN102298641A14 Sep 201128 Dec 2011清华大学一种基于键值库的文件与结构化数据统一存储方法
US719448319 Mar 200320 Mar 2007Intelligenxia, Inc.Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information
US2008201290 Title not available
Non-Patent Citations
Reference
1田万鹏 等.一种基于特征的非结构化数据演化管理建模框架.《计算机研究与发展》.2010,(第47期),394-399.
Classifications
International ClassificationG06F17/30
Legal Events
DateCodeEventDescription
24 Oct 2012C06Publication
19 Dec 2012C10Request of examination as to substance
20 Aug 2014C14Granted