CN102750354B - Method for analyzing and processing non-structured data query operating language - Google Patents

Method for analyzing and processing non-structured data query operating language Download PDF

Info

Publication number
CN102750354B
CN102750354B CN201210190832.5A CN201210190832A CN102750354B CN 102750354 B CN102750354 B CN 102750354B CN 201210190832 A CN201210190832 A CN 201210190832A CN 102750354 B CN102750354 B CN 102750354B
Authority
CN
China
Prior art keywords
enquiry module
index
storehouse
internal command
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210190832.5A
Other languages
Chinese (zh)
Other versions
CN102750354A (en
Inventor
王建民
丁贵广
卓安
黄向东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201210190832.5A priority Critical patent/CN102750354B/en
Publication of CN102750354A publication Critical patent/CN102750354A/en
Application granted granted Critical
Publication of CN102750354B publication Critical patent/CN102750354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a method for analyzing and processing a non-structured data query operating language, which belongs to the technical field of management of computer data. According to the method for analyzing and processing a non-structured data query operating language provided by the invention, a structured query language is defined specific to the query of non-structured data, and the language is easy to extend and can be fused with customized query functions like the query language grammar of the conventional relation database. The method comprises the following steps of starting a query module in a key value library, receiving a query language request of a user, analyzing a language and converting into an internal command; calling each functional module in the key value library to execute by using the query module according to the internal command; and returning a result to the user after the command is executed. According to the method, the query module is taken as a core, and a key value library on a bottom layer is accessed in a way of designing a similar SQL (Structured Query Language), so that the user can operate the key value library easily and manage non-structured data.

Description

A kind of parsing of non-structural data enquiry operating language and disposal route
Technical field
The parsing and the disposal route that the present invention relates to a kind of non-structural data enquiry operating language, belong to computer data management technical field.
Background technology
Along with becoming increasingly abundant and the development of IT application in enterprise of the emerging application such as internet, there is a large amount of unstructured datas.Because unstructured data data type is abundant, complex structure, there is no data structure constraint clear and definite, unified definition, in addition the data scale of its magnanimity, highly dynamic data characteristic, various application scenarios, unified associating requirements for access, makes unstructured data management face huge challenge.
Traditional relational database is difficult to propose effective solution on the unstructured data of processing magnanimity.The data model of traditional database is all the logical organization of mode prioritization, and unstructured data is the logical organization that pattern lags behind, and this data managing method that makes to be based upon on relational algebra basis is no longer valid in the problem that solves unstructured data.The mass property of unstructured data also makes traditional database helpless in performance and extendability.
The mode prioritization logic of traditional database has been broken in emerging key assignments storehouse in the mode of non-mode, it has guaranteed read-write at a high speed in the mode of key assignments simultaneously.Current trend also develops key assignments storehouse rapidly and has HBase, MangoDB, Dynamo and Cassandra etc.They have guaranteed storage and the extendability of mass data in distributed type assemblies mode, the present invention is the key assignments storehouse based on so just.
Yet emerging key assignments storehouse does not have perfect inquiry mode and query language.As HBase provides API Access, Cassandra provides the SQL-like language mode of API and a kind of CQL by name to access.Yet they are due to the restriction in its data storehouse, only can unstructured data be carried out simple inquiry and be upgraded, complicated analytic function is not provided, do not consider the language description mode of Large Volume Data yet.CouchDB combines at the unified query language UnQL that attempts to design key assignments storehouse with SQLite two founders, yet also only only has at present blank, for this feature of many characteristic queries of unstructured data, does not also effectively consider.
From the angle of final user and application, non-structural data enquiry language should solve following problem:
(1) seating surface is to the non-structural data enquiry of key assignments library storage;
Existing unstructured data is mainly with being stored in key assignments storehouse the solution with efficient read-write as magnanimity, and key assignments storehouse does not often provide easy-to-use query language.
(2) can efficient solution never with manifold unified query of unstructured data;
The language such as existing CQL only provide simple query function, cannot carry out characteristic key to unstructured data.Such as view data being carried out to the characteristic key such as histogram, color, audio frequency is carried out to MFCC characteristic key etc.
(3) how to carry out data query and analysis effectively.
Traditional data inquiry only realizes index and simple statistical function.For the unstructured data of magnanimity, the analysis that a lot of results need to be carried out data draws, so more data analysis function should be supported as much as possible in query language.
Summary of the invention
The object of the invention is to propose a kind of parsing and disposal route of non-structural data enquiry operating language, the problem existing for unstructured data management domain, by a kind of mode of similar sql like language, visit the key assignments storehouse of bottom, to reach the object that allows user's easy manipulation key assignments storehouse manage unstructured data.
Parsing and the disposal route of the unstructured data managing queries language that the present invention proposes, comprise the following steps:
(1) start the enquiry module in key assignments storehouse, the query language request of enquiry module monitoring users;
(2) enquiry module receive user query language request, language is resolved, analyzing step is as follows:
(2-1) user side adopts query language type of drive to connect enquiry module, sets up the session between user side and enquiry module, and preserves the session information in conversation procedure, and access queries module, sends query language to enquiry module;
(2-2), by the resolver in enquiry module, the query language request that enquiry module sends user side is converted to internal command;
(3) above-mentioned internal command is judged, if the order of this internal command for specifying the key assignments storehouse of this session to show, enquiry module is preserved the name of this appointment key assignments storehouse table, and in follow-up order, gives tacit consent to this session and carry out under this key assignments storehouse table; If the optional position in query language has a similar key word, enquiry module is handed to the index calling module in key assignments storehouse by this internal command; If the optional position in query language has a function key word, enquiry module is handed to the function call module in key assignments storehouse by this internal command;
(4) enquiry module in key assignments storehouse is according to internal command, and each functional module of calling in key assignments storehouse is carried out internal command, and detailed process is as follows:
If (4-1) internal command is structured query commands, adopt the server fill order in key assignments storehouse;
If (4-2) internal command, for creating the index order of key assignments storehouse, adopts the server fill order in key assignments storehouse;
If (4-3) internal command, for creating the index order of non-key assignments storehouse, builds an index and realizes storehouse, and call index and realize storehouse fill order;
If (4-4) internal command is service data Functional Analysis order, build a data Functional Analysis module, and calling data Functional Analysis module fill order, enquiry module obtains executing state and the execution result of order;
If (4-5) internal command is large data transmission, use independently data transmission stream wait to be connected with user side, complete after connection, by data transmission stream, carry out file transfer; After end of transmission (EOT), enquiry module is preserved the file of transmission, and keeps the session between user side and enquiry module;
If (4-6) internal command is self-defined establishment index, search index and sets up function, the fill order of self-defined establishment index and search index, by a key word, indicate establishment parameter and the index creation type of index, complete establishment and the inquiry of index; For the self-defined fill order of setting up function, enquiry module is according to the elongated parameter of the function key word in query language and function, and the function of listing from the configuration file of enquiry module is supported, in type, to select corresponding function, completes the foundation of function;
If (4-7) internal command is the conjunctive query of polytype index, enquiry module breaks to polytype index, obtain the inquiry clause of each types index, according to inquiry clause, read the priority of different index inquiry in the configuration file of enquiry module, adjust multiple queries clause's search order, inquire about;
(5) enquiry module returns to Query Result to user side.
Parsing and the disposal route of the unstructured data managing queries language that the present invention proposes, inquiry for unstructured data, defined structurized query language, with the query language syntactic class of traditional relational seemingly, the easily expansion can merge self-defining query function of this language.The core of the inventive method is enquiry module, by design interface, makes enquiry module and the loose coupling of key assignments storehouse, can easily the enquiry module in existing key assignments storehouse be transplanted in other key assignments storehouses; The inventive method provides the multiple self-defined characteristic key that comprises, therefore can directly manage multiple unstructured data; The inventive method can be supported the read-write operation of large data (as file), and the executable operations of the data analysis distributed that provides support function and the Query priority that can configure such as arrange at the feature, guarantee management unstructured data efficiently.
Embodiment
Parsing and the disposal route of the unstructured data managing queries language that the present invention proposes, comprise the following steps:
(1) start the enquiry module in key assignments storehouse, the query language request of enquiry module monitoring users;
(2) enquiry module receive user query language request, language is resolved, analyzing step is as follows:
(2-1) user side adopts query language type of drive to connect enquiry module, sets up the session between user side and enquiry module, and preserves the session information in conversation procedure, and access queries module, sends query language to enquiry module;
(2-2), by the resolver in enquiry module, the query language request that enquiry module sends user side is converted to internal command;
(3) above-mentioned internal command is judged, if the order of this internal command for specifying the key assignments storehouse of this session to show, enquiry module is preserved the name of this appointment key assignments storehouse table, and in follow-up order, gives tacit consent to this session and carry out under this key assignments storehouse table; If the optional position in query language has similar (like) key word, enquiry module is handed to the index calling module in key assignments storehouse by this internal command; If the optional position in query language has a function (function) key word, enquiry module is handed to the function call module in key assignments storehouse by this internal command;
(4) enquiry module in key assignments storehouse is according to internal command, and each functional module of calling in key assignments storehouse is carried out internal command, and detailed process is as follows:
If (4-1) internal command is structured query commands, as created table, creating in row Zu Huolie family and add and delete data, adopt the server fill order in key assignments storehouse;
If (4-2) internal command, for creating the index order of key assignments storehouse, adopts the server fill order in key assignments storehouse;
If (4-3) internal command, for creating the index order of non-key assignments storehouse, as the full-text index of the high dimensional indexing of picture, text, builds an index and realizes storehouse, and call index and realize storehouse fill order;
If (4-4) internal command is service data Functional Analysis order, build a data Functional Analysis module, and calling data Functional Analysis module fill order, enquiry module obtains executing state and the execution result of order;
If (4-5) internal command is large data transmission, use independently data transmission stream wait to be connected with user side, complete after connection, by data transmission stream, carry out file transfer; After end of transmission (EOT), enquiry module is preserved the file of transmission, and keeps the session between user side and enquiry module;
If (4-6) internal command is self-defined establishment index, search index and sets up function, the query language that the present invention proposes reaches the effect of multiple index creation and inquiry, many kinds of function support by the setting of semi open model key word; For the fill order of self-defined establishment index and search index, for example, by a key word (with), indicate establishment parameter and the index creation type of index, complete establishment and the inquiry of index; For the self-defined fill order of setting up function, enquiry module is according to the elongated parameter of the function key word in query language and function, and the function of listing from the configuration file of enquiry module is supported, in type, to select corresponding function, completes the foundation of function;
If (4-7) internal command is the conjunctive query of polytype index, in comparatively complicated query statement, can there is the conjunctive query of key assignments storehouse acquiescence search index (filtration of train value or key assignments), a plurality of self-defined search indexs simultaneously; Enquiry module breaks to polytype index, obtains the inquiry clause of each types index, according to inquiry clause, reads the priority of different index inquiry in the configuration file of enquiry module, adjusts multiple queries clause's search order, inquires about;
(5) enquiry module returns to Query Result to user side.

Claims (1)

1. the parsing of unstructured data managing queries language and a disposal route, is characterized in that the method comprises the following steps:
(1) start the enquiry module in key assignments storehouse, the query language request of enquiry module monitoring users;
(2) enquiry module receives user's query language request, and language is resolved, and analyzing step is as follows:
(2-1) user side adopts query language type of drive to connect enquiry module, sets up the session between user side and enquiry module, and preserves the session information in conversation procedure, and access queries module, sends query language to enquiry module;
(2-2), by the resolver in enquiry module, the query language request that enquiry module sends user side is converted to internal command;
(3) above-mentioned internal command is judged, if the order of this internal command for specifying the key assignments storehouse of this session to show, enquiry module is preserved the name of this appointment key assignments storehouse table, and in follow-up order, gives tacit consent to this session and carry out under this key assignments storehouse table; If the optional position in query language has a similar key word, enquiry module is handed to the index calling module in key assignments storehouse by this internal command; If the optional position in query language has a function key word, enquiry module is handed to the function call module in key assignments storehouse by this internal command;
(4) enquiry module in key assignments storehouse is according to internal command, and each functional module of calling in key assignments storehouse is carried out internal command, and detailed process is as follows:
If (4-1) internal command is structured query commands, adopt the server fill order in key assignments storehouse;
If (4-2) internal command, for creating the index order of key assignments storehouse, adopts the server fill order in key assignments storehouse;
If (4-3) internal command, for creating the index order of non-key assignments storehouse, builds an index and realizes storehouse, and call index and realize storehouse fill order;
If (4-4) internal command is service data Functional Analysis order, build a data Functional Analysis module, and calling data Functional Analysis module fill order, enquiry module obtains executing state and the execution result of order;
If (4-5) internal command is large data transmission, use independently data transmission stream wait to be connected with user side, complete after connection, by data transmission stream, carry out file transfer; After end of transmission (EOT), enquiry module is preserved the file of transmission, and keeps the session between user side and enquiry module;
If (4-6) internal command is self-defined establishment index fill order, search index fill order and set up function fill order, self-defined establishment index fill order and search index fill order wherein, by a key word, indicate establishment parameter and the index creation type of index, complete establishment and the inquiry of index, the self-defined fill order of setting up function wherein, enquiry module is according to the elongated parameter of the function key word in query language and function, the function of listing from the configuration file of enquiry module is supported in type, select corresponding function, complete the foundation of function,
If (4-7) internal command is the conjunctive query of polytype index, enquiry module breaks to polytype index, obtain the inquiry clause of each types index, according to inquiry clause, read the priority of different index inquiry in the configuration file of enquiry module, adjust multiple queries clause's search order, inquire about;
(5) enquiry module returns to Query Result to user side.
CN201210190832.5A 2012-06-11 2012-06-11 Method for analyzing and processing non-structured data query operating language Active CN102750354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210190832.5A CN102750354B (en) 2012-06-11 2012-06-11 Method for analyzing and processing non-structured data query operating language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210190832.5A CN102750354B (en) 2012-06-11 2012-06-11 Method for analyzing and processing non-structured data query operating language

Publications (2)

Publication Number Publication Date
CN102750354A CN102750354A (en) 2012-10-24
CN102750354B true CN102750354B (en) 2014-08-20

Family

ID=47030539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210190832.5A Active CN102750354B (en) 2012-06-11 2012-06-11 Method for analyzing and processing non-structured data query operating language

Country Status (1)

Country Link
CN (1) CN102750354B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425779A (en) * 2013-08-19 2013-12-04 曙光信息产业股份有限公司 Data processing method and data processing device
CN104516964B (en) * 2014-12-24 2018-06-08 北京奇虎科技有限公司 The generation method of database function interface, the processing method and processing device of onboard data
CN107122418A (en) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 A kind of querying method and device
CN108090139B (en) * 2017-11-30 2021-10-01 北京邮电大学 File retrieval method and device
CN108846003A (en) * 2018-04-20 2018-11-20 广东电网有限责任公司 A kind of unstructured machine data processing method and processing device
CN113326033B (en) * 2021-06-09 2023-08-11 北京八分量信息科技有限公司 Key-value storage system with multi-language API
CN113468209A (en) * 2021-07-27 2021-10-01 广西电网有限责任公司 High-speed memory database access method for power grid monitoring system
CN116303581B (en) * 2023-05-19 2023-08-04 山东浪潮数字商业科技有限公司 Method, system, equipment and medium for adapting split-flow query load among heterogeneous databases

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7194483B1 (en) * 2001-05-07 2007-03-20 Intelligenxia, Inc. Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information
CN102129469A (en) * 2011-03-23 2011-07-20 华中科技大学 Virtual experiment-oriented unstructured data accessing method
CN102298641A (en) * 2011-09-14 2011-12-28 清华大学 Method for uniformly storing files and structured data based on key value bank

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201290A1 (en) * 2007-02-16 2008-08-21 International Business Machines Corporation Computer-implemented methods, systems, and computer program products for enhanced batch mode processing of a relational database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7194483B1 (en) * 2001-05-07 2007-03-20 Intelligenxia, Inc. Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information
CN102129469A (en) * 2011-03-23 2011-07-20 华中科技大学 Virtual experiment-oriented unstructured data accessing method
CN102298641A (en) * 2011-09-14 2011-12-28 清华大学 Method for uniformly storing files and structured data based on key value bank

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于特征的非结构化数据演化管理建模框架;田万鹏 等;《计算机研究与发展》;20101231(第47期);394-399 *
田万鹏 等.一种基于特征的非结构化数据演化管理建模框架.《计算机研究与发展》.2010,(第47期),394-399.

Also Published As

Publication number Publication date
CN102750354A (en) 2012-10-24

Similar Documents

Publication Publication Date Title
CN102750354B (en) Method for analyzing and processing non-structured data query operating language
CN105260403B (en) General integration across database access method
CN107463637B (en) Distributed NewSQL database system and data storage method
US11468103B2 (en) Relational modeler and renderer for non-relational data
CN110837492B (en) Method for providing data service by multi-source data unified SQL
TWI706259B (en) Data query method and query device
WO2016123920A1 (en) Method and system for achieving integration interface supporting operations of multiple types of databases
CN107491561B (en) Ontology-based urban traffic heterogeneous data integration system and method
CN100590621C (en) Editing method of semantic mapping information between ontology schema and relational database schema
EP3285178A1 (en) Data query method in crossing-partition database, and crossing-partition query device
WO2020135613A1 (en) Data query processing method, device and system, and computer-readable storage medium
WO2015062181A1 (en) Method for achieving automatic synchronization of multisource heterogeneous data resources
CN102750358B (en) Mapping method and system of system data model to common information model (CIM)
EP3005164A1 (en) Value based windows on relations in continuous data streams
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
CN102708203A (en) Database dynamic management method based on XML metadata
CN102254021A (en) Method for constructing database based on virtual machine management system
CN107656951B (en) A kind of method of real time data in synchronous and heterogeneous Database Systems
CN109947791A (en) A kind of database statement optimization method, device, equipment and storage medium
CN108959538A (en) Text retrieval system and method
CN110716952A (en) Multi-source heterogeneous data processing method and device and storage medium
WO2019015364A1 (en) Method and device for executing structured query language (sql) instruction
CN107977446A (en) A kind of memory grid data load method based on data partition
CN103838781A (en) Database access method and system
WO2024060956A1 (en) Hybrid database management method and apparatus, hybrid database, and electronic device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant