CN104239530A - Method and device for parallel query of HBase tables - Google Patents

Method and device for parallel query of HBase tables Download PDF

Info

Publication number
CN104239530A
CN104239530A CN201410483073.0A CN201410483073A CN104239530A CN 104239530 A CN104239530 A CN 104239530A CN 201410483073 A CN201410483073 A CN 201410483073A CN 104239530 A CN104239530 A CN 104239530A
Authority
CN
China
Prior art keywords
concurrent
inquiry
hbase
htable
hbase table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410483073.0A
Other languages
Chinese (zh)
Inventor
刘璧怡
郭美思
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410483073.0A priority Critical patent/CN104239530A/en
Publication of CN104239530A publication Critical patent/CN104239530A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Abstract

The invention provides a method and device for parallel query of HBase tables. The method comprises the following steps of setting a HBase query service-end program; setting a parallel-query program; adopting the parallel-query program to call the HBase query service-end program and carrying out parallel query on the HBase tables. The method and device for calling the HBase query service-end program by parallel query provided by the invention have the advantages that the parallel query of the HBase tables is realized, the overhead for repeated connection with a HBase service end can be reduced, the efficiency of the system responding parallel access is improved, and simultaneously the data cache is realized, so that the response speed is faster and the efficiency of parallel query is improved.

Description

The method and apparatus of concurrent inquiry HBase table
Technical field
The present invention relates to technical field of data processing, particularly relate to a kind of method and apparatus of concurrent inquiry HBase table.
Background technology
Along with the fast development of internet, the data that network access quantity, network log, video data, log etc. produce sharply are increasing, and various types of data are the phenomenon that magnanimity increases.Technology traditional at present cannot process this situation, and the application of distributed computing technology alleviates this problem.
In distributed computing technology, Hadoop platform provides a series of assembly can process large data.Hadoop is a distributed system architecture, Hadoop itself is by Hadoop distributed file system (HDFS, Hadoop Distributed File System) and distributed computing framework MapReduce form, this HDFS has high fault-tolerant and can be deployed on cheap computing machine.HBase is one of assembly of Hadoop, be structured on Apache Hadoop sparse, towards row distributed data base.HBase utilizes Hadoop HDFS as its document storage system, utilizes Hadoop MapReduce to process the mass data in HBase, utilizes Zookeeper as cooperation with service.HBase is different from traditional database, and it is a database being applicable to unstructured data and storing, and has high scalability, row store, scalable, the feature that can read and write large data in real time.
The mode of inquiry HBase table has two kinds, and one is scanning (Scan) mode, and another kind obtains (Get) mode.In Scan inquiry mode, a segment record can be obtained by specified scope; In Get inquiry mode, be obtain a record according to single RowKey.
User have need the concurrent inquiry of HBase table and needs at the appointed time in obtain the demand of Query Result, in existing Scan inquiry mode, a batch data can be captured according to attributes such as Caching and Batch arranged in Scan, then in the data captured, query manipulation is carried out, can complete by MapReduce program when completing concurrent inquiry HBase table, but in the process that MapReduce performs, there are some time overheads of the operations such as task start, make search efficiency bad.
In view of this, a kind of prioritization scheme of concurrent inquiry HBase table is needed to solve the problem of prior art.
Summary of the invention
In order to solve the problems of the technologies described above, the invention provides a kind of method and apparatus of concurrent inquiry HBase table, the efficiency of concurrent inquiry HBase table can greatly be improved.
In order to reach the object of the invention, the invention provides a kind of method of concurrent inquiry HBase table, comprising: inquiry HBase serve end program is set; Concurrent polling routine is set; Concurrent polling routine is adopted to call inquiry HBase serve end program, concurrent inquiry HBase table.
Further, inquiry HBase serve end program is set, comprises: instantiation HTablePool object, according to the HTablePool object acquisition HTable example of instantiation; According to the demand of Hbase client, querying attributes is set in the HTable example obtained; Be arranged through the record in getScanner inquiry HBase table.
Further, instantiation HTablePool object, before the HTablePool object acquisition HTable example according to instantiation, also comprises: build HTablePool object pool, HTablePool safeguards the HTable example of fixed qty, deposits HTable example by the PoolMap of HTablePoo.
Further, according to the demand of Hbase client, querying attributes is set in the HTable example obtained, comprises: according to the demand of Hbase client, arrange in the HTable example obtained and start Key and cut-off Key.
Further, according to the demand of Hbase client, querying attributes is set in the HTable example obtained, also comprises: according to the demand of Hbase client, the Caching parameter in HTable example, Batch parameter and CacheBlocks parameter are set.
Further, after being arranged through the record in getScanner inquiry HBase table, also comprise: if the demand of the Query Result inquired about by getScanner and Hbase client has difference, then according to the inquiry output format being arranged Hbase client demand by RowKey and value value.
Further, concurrent polling routine is set, comprises: concurrent polling routine is set by the time lock in java concurrent.
Further, adopt concurrent polling routine to call inquiry HBase serve end program, concurrent inquiry HBase table, comprising: concurrent polling routine calls inquiry HBase serve end program, Concurrency Access task be distributed in the node in cluster, each node concurrence performance is to the inquiry of HBase table.
Further, also comprise: by inquiry log, record is carried out to the result phase of concurrent inquiry, wherein, carry out record by error, failed, success state, error represents that inquiry makes mistakes, failed represents and inquires about unsuccessfully, and success represents successful inquiring; When inquiries all in concurrent inquiry is all success state, concurrent inquiry HBase table success, when successful inquiring, records the averaging time of concurrent inquiry HBase table.
A device for concurrent inquiry HBase table, comprising: the first setting unit, for arranging inquiry HBase serve end program; Second setting unit, for arranging concurrent polling routine; Call unit, calls inquiry HBase serve end program, concurrent inquiry HBase table for adopting concurrent polling routine.
Further, also comprise: record cell, for by inquiry log, record is carried out to the result phase of concurrent inquiry, and when successful inquiring, records the averaging time of concurrent inquiry HBase table.
Compared with prior art, the present invention includes: inquiry HBase serve end program is set; Concurrent polling routine is set; Concurrent polling routine is adopted to call inquiry HBase serve end program, concurrent inquiry HBase table.The present invention calls the method for inquiry HBase serve end program by concurrent polling routine, realize concurrent inquiry HBase table, the expense repeating with HBase service end to be connected can be reduced, efficiency during raising system reply Concurrency Access, achieve the buffer memory to data simultaneously, thus have response speed faster, improve the efficiency of concurrent inquiry.
Accompanying drawing explanation
Fig. 1 is the block schematic illustration that the present invention builds HTablePool object pool.
Fig. 2 is the schematic flow sheet of the method for the concurrent inquiry HBase table of the present invention.
Fig. 3 is the schematic flow sheet that the present invention arranges the method for inquiry HBase serve end program.
Fig. 4 is the structural representation of the device of the concurrent inquiry HBase table of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.By these exemplifying embodiments of enough detailed description, those skilled in the art are made to put into practice the present invention.Without departing from the spirit and scope in the present invention, can to implement to make logic, realize and other change.
HTable and HTablePool is a part of HBase client application DLL (dynamic link library) (APIApplication Programming Interface), HTable and HTablePool can be used to operate HBase table.Wherein, HTable is the Java API object of HBase client and the communication of HBase service end, and HBase client can be carried out additions and deletions by HTable object and HBase service end and be changed operations such as looking into; HTablePool obtains corresponding HTable object entity by getTable method, for carrying out, operations such as looking into is changed to the additions and deletions of HBase table, this HTablePool can solve the thread unsafe problems that HTable exists, simultaneously by safeguarding the HTable object of fixed qty, can at program run duration these HTable resource objects multiplexing.
In the present invention, in order to improve search efficiency during concurrent inquiry HBase table, need to use the HTablePool object pool in HBase, the principle framework figure building HTablePool object pool can be as shown in Figure 1.
Need to set up HTablePool example before use HTable, this HTablePool example can create an internal object PoolMap simultaneously, and this PoolMap is used for depositing HTable example according to table name.HTablePool includes Resuable, THreadLocal, RoundRobin three types, and wherein Resuable reuses pond, inner utilization queue, being put into afterbody, getting head during get when closing; THreadLocal is the local pond of thread, and a thread only goes out a HTable; Counting is used to return corresponding HTable when RoundRobin is each get.
HTablePool can create HTable object automatically, and is completely transparent for client, so can avoid Data Concurrent amendment problem between multithreading.Adopt public configuration to connect between HTable object in HTablePool, therefore, it is possible to can network overhead be reduced, thus improve search efficiency.
Fig. 2 is the schematic flow sheet of the method for the concurrent inquiry HBase table of the present invention, as shown in Figure 2, specifically can comprise:
Step 21, arranges inquiry HBase serve end program.
This step as shown in Figure 3, can specifically comprise:
Step 211, instantiation HTablePool object, according to the HTablePool object acquisition HTable example of instantiation.
Particularly, before instantiation HTablePool object, by building HTablePool object pool, HTablePool safeguards the HTable example of fixed qty, deposit HTable example by the PoolMap of HTablePoo, so, can in program operation process multiplexing HTable example.
HTable object can share Configuration object, such benefit is the connection of shared ZooKeeper, each client needs to connect with ZooKeeper, the tableregions position of inquiring user, and these information can be cached after a connection setup to share and use; Another benefit is shared public resource, client needs to search-ROOT-and .META. table by ZooKeeper, this needs Internet Transmission expense, can reduce follow-up Internet Transmission expense after these public resources of client-cache, accelerates search procedure speed.
Step 222, according to the demand of Hbase client, arranges querying attributes in the HTable example obtained.
Particularly, according to the demand of Hbase client, arrange in the HTable example obtained and start key (Key) and cut-off Key.
Caching parameter in HTable example, Batch parameter and CacheBlocks parameter etc. can also be set further.
Step 223, is arranged through the record in getScanner inquiry HBase table.
Particularly, if the demand of the Query Result inquired about by getScanner and Hbase client has difference, then according to the inquiry output format being arranged Hbase client demand by RowKey and attribute (value) value.
Step 22, arranges concurrent polling routine.
In this step, concurrent polling routine is set by the time lock in java concurrent.
Step 23, adopts concurrent polling routine to call inquiry HBase serve end program, concurrent inquiry HBase table.
In this step, concurrent polling routine calls inquiry HBase serve end program, and Concurrency Access task be distributed in the node in cluster, each node concurrence performance is to the inquiry of HBase table.
Step 24, by inquiry log, carries out record to the result phase of concurrent inquiry, and when successful inquiring, records the averaging time of concurrent inquiry HBase table.
In this step, carry out record by error, failed, success state, wherein error represents that inquiry makes mistakes, and failed represents and inquires about unsuccessfully, and success represents successful inquiring.
The success of concurrent inquiry HBase table is just represented when inquiries all in concurrent inquiry is all success state.When successful inquiring, record the averaging time of concurrent inquiry HBase table.
With an instantiation, the present invention is described further below, first need to dispose distributed type assemblies environment, such as, the hardware environment of this cluster is 8 station servers, the configuration of every station server comprises central processing unit (CPU, Central Processing Unit) core (Core) quantity be 12, inside save as 189G, hard disk is 2T*12; The software environment of this cluster is operating system is Centos6.1, and software version is Hadoop-1.1.0, HBase-0.94, Zookeeper-3.4.5.
Suppose that the demand of the concurrent inquiry HBase table in the present invention is the location information to 10,000,000,000 data query goods processes, concurrent inquiry is completed according to goods ID and from date cut-off date, namely to the inquiry under 500 complications of 10,000,000,000 data, then record 500 and give the averaging time of inquiring about goods information.
Inquiry HBase serve end program be set and concurrent polling routine is set, adopting concurrent polling routine to call inquiry HBase serve end program, realize concurrent inquiry HBase table.By the load-balancing scenario of 500 Concurrency Access, 500 Concurrency Access tasks are distributed on 8 nodes, make these 8 nodes simultaneously execution concurrence access the inquiry simultaneously performed hbase.The average response time HBase table of 10,000,000,000 data being carried out to concurrent inquiry remains on about 50ms, average transaction processing power about 10000/ second, and needs average response time about 13min without the concurrent inquiry optimized, and they differ an order of magnitude.This result shows that this method well can reach the demand of user, improves the performance of concurrent inquiry.
The present invention calls the method for inquiry HBase serve end program by concurrent polling routine, realize concurrent inquiry HBase table, the expense repeating with HBase service end to be connected can be reduced, efficiency during raising system reply Concurrency Access, achieve the buffer memory to data simultaneously, thus have response speed faster, improve the efficiency of concurrent inquiry.
Fig. 4 is the structural representation of the device of the concurrent inquiry HBase table of the present invention, as shown in Figure 4, specifically can comprise:
First setting unit, for arranging inquiry HBase serve end program.
Second setting unit, for arranging concurrent polling routine.
Call unit, calls inquiry HBase serve end program, concurrent inquiry HBase table for adopting concurrent polling routine.
Record cell, for by inquiry log, carries out record to the result phase of concurrent inquiry, and when successful inquiring, records the averaging time of concurrent inquiry HBase table.
The device of concurrent inquiry HBase table is corresponding with the method for concurrent inquiry HBase table, and therefore, the details that realizes that the device of concurrent inquiry HBase table is concrete referring to the method for concurrent inquiry HBase table, can be not repeated herein.
The present invention calls the method for inquiry HBase serve end program by concurrent polling routine, realize concurrent inquiry HBase table, the expense repeating with HBase service end to be connected can be reduced, efficiency during raising system reply Concurrency Access, achieve the buffer memory to data simultaneously, thus have response speed faster, improve the efficiency of concurrent inquiry.
Be to be understood that, although this instructions is described according to embodiment, but not each embodiment only comprises an independently technical scheme, this narrating mode of instructions is only for clarity sake, those skilled in the art should by instructions integrally, technical scheme in each embodiment also through appropriately combined, can form other embodiments that it will be appreciated by those skilled in the art that.
A series of detailed description listed is above only illustrating for feasibility embodiment of the present invention; they are not for limiting the scope of the invention, all do not depart from equivalent implementations that skill of the present invention spirit does or change all should be included within protection scope of the present invention.

Claims (11)

1. a method for concurrent inquiry HBase table, is characterized in that, comprising:
Inquiry HBase serve end program is set;
Concurrent polling routine is set;
Concurrent polling routine is adopted to call inquiry HBase serve end program, concurrent inquiry HBase table.
2. the method for concurrent inquiry HBase table according to claim 1, is characterized in that, described setting inquires about HBase serve end program, comprising:
Instantiation HTablePool object, according to the HTablePool object acquisition HTable example of instantiation;
According to the demand of Hbase client, querying attributes is set in the HTable example obtained;
Be arranged through the record in getScanner inquiry HBase table.
3. the method for concurrent inquiry HBase table according to claim 2, is characterized in that, at described instantiation HTablePool object, before the HTablePool object acquisition HTable example according to instantiation, also comprises:
Build HTablePool object pool, HTablePool safeguards the HTable example of fixed qty, deposits HTable example by the PoolMap of HTablePoo.
4. the method for the concurrent inquiry HBase table according to Claims 2 or 3, is characterized in that, the described demand according to Hbase client, arranges querying attributes, comprising in the HTable example obtained:
According to the demand of Hbase client, arrange in the HTable example obtained and start Key and cut-off Key.
5. the method for concurrent inquiry HBase table according to claim 4, is characterized in that, the described demand according to Hbase client, arranges querying attributes, also comprise in the HTable example obtained:
According to the demand of Hbase client, the Caching parameter in HTable example, Batch parameter and CacheBlocks parameter are set.
6. the method for concurrent inquiry HBase table according to claim 5, is characterized in that, described in be arranged through the record that getScanner inquires about in HBase table after, also comprise:
If the demand of the Query Result inquired about by getScanner and Hbase client has difference, then according to the inquiry output format being arranged Hbase client demand by RowKey and value value.
7. the method for concurrent inquiry HBase table according to claim 1, is characterized in that, describedly arranges concurrent polling routine, comprising:
By the time lock in java concurrent, concurrent polling routine is set.
8. the method for concurrent inquiry HBase table according to claim 1, is characterized in that, the concurrent polling routine of described employing calls inquiry HBase serve end program, and concurrent inquiry HBase table, comprising:
Concurrent polling routine calls inquiry HBase serve end program, and Concurrency Access task be distributed in the node in cluster, each node concurrence performance is to the inquiry of HBase table.
9. the method for concurrent inquiry HBase table according to claim 1, is characterized in that, also comprise:
By inquiry log, carry out record to the result phase of concurrent inquiry, wherein, carry out record by error, failed, success state, error represents that inquiry makes mistakes, and failed represents and inquires about unsuccessfully, and success represents successful inquiring;
When inquiries all in concurrent inquiry is all success state, concurrent inquiry HBase table success, when successful inquiring, records the averaging time of concurrent inquiry HBase table.
10. a device for concurrent inquiry HBase table, is characterized in that, comprising:
First setting unit, for arranging inquiry HBase serve end program;
Second setting unit, for arranging concurrent polling routine;
Call unit, calls inquiry HBase serve end program, concurrent inquiry HBase table for adopting concurrent polling routine.
The device of 11. concurrent inquiry HBase table according to claim 10, is characterized in that, also comprise:
Record cell, for by inquiry log, carries out record to the result phase of concurrent inquiry, and when successful inquiring, records the averaging time of concurrent inquiry HBase table.
CN201410483073.0A 2014-09-19 2014-09-19 Method and device for parallel query of HBase tables Pending CN104239530A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410483073.0A CN104239530A (en) 2014-09-19 2014-09-19 Method and device for parallel query of HBase tables

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410483073.0A CN104239530A (en) 2014-09-19 2014-09-19 Method and device for parallel query of HBase tables

Publications (1)

Publication Number Publication Date
CN104239530A true CN104239530A (en) 2014-12-24

Family

ID=52227589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410483073.0A Pending CN104239530A (en) 2014-09-19 2014-09-19 Method and device for parallel query of HBase tables

Country Status (1)

Country Link
CN (1) CN104239530A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573022A (en) * 2015-01-12 2015-04-29 浪潮软件股份有限公司 Data query method and device for HBase
CN107291881A (en) * 2017-06-19 2017-10-24 北京计算机技术及应用研究所 Massive logs storage and querying method based on HBase

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663117A (en) * 2012-04-18 2012-09-12 中国人民大学 OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform
WO2014085624A2 (en) * 2012-11-30 2014-06-05 Orbis Technologies, Inc. Ontology harmonization and mediation systems and methods

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663117A (en) * 2012-04-18 2012-09-12 中国人民大学 OLAP (On Line Analytical Processing) inquiry processing method facing database and Hadoop mixing platform
WO2014085624A2 (en) * 2012-11-30 2014-06-05 Orbis Technologies, Inc. Ontology harmonization and mediation systems and methods

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
乔治: "《HBase权威指南》", 31 October 2013 *
杜晓东: "大数据环境下基于Hbase的分布式查询优化研究", 《计算机光盘软件与应用》 *
王静蕾: "Hadoop云计算框架中的分布式数据库HBase研究", 《商丘职业技术学院学报》 *
行者无疆_路过: "Hbase访问方式之Java API", 《HTTP://BLOG.CSDN.NET/WOSHIWANXIN102213/ARTICLE/DETAILS/17676961》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573022A (en) * 2015-01-12 2015-04-29 浪潮软件股份有限公司 Data query method and device for HBase
CN107291881A (en) * 2017-06-19 2017-10-24 北京计算机技术及应用研究所 Massive logs storage and querying method based on HBase

Similar Documents

Publication Publication Date Title
US20210064476A1 (en) Backup of partitioned database tables
US11036591B2 (en) Restoring partitioned database tables from backup
US11327949B2 (en) Verification of database table partitions during backup
US8612406B1 (en) Sharing business data across networked applications
US20190068690A1 (en) Automated management of resource attributes across network-based services
EP2962225B1 (en) Database system providing single-tenant and multi-tenant environments
US9053167B1 (en) Storage device selection for database partition replicas
US11726984B2 (en) Data redistribution method and apparatus, and database cluster
US10338958B1 (en) Stream adapter for batch-oriented processing frameworks
US10133797B1 (en) Distributed heterogeneous system for data warehouse management
US10243919B1 (en) Rule-based automation of DNS service discovery
US10158709B1 (en) Identifying data store requests for asynchronous processing
US10102230B1 (en) Rate-limiting secondary index creation for an online table
US9910881B1 (en) Maintaining versions of control plane data for a network-based service control plane
CN110581893B (en) Data transmission method and device, routing equipment, server and storage medium
US9875270B1 (en) Locking item ranges for creating a secondary index from an online table
CN106202082B (en) Method and device for assembling basic data cache
CN111209364A (en) Mass data access processing method and system based on crowdsourcing map updating
US10872097B2 (en) Data resolution system for management of distributed data
CN112783551A (en) Interface document generation method of micro-service framework, electronic equipment and storage medium
US20170097955A1 (en) Action-based routing of a transaction in an online transaction processing system
US9747339B2 (en) Server-based management for querying eventually-consistent database
Hlupić et al. An overview of current trends in data ingestion and integration
CN104239530A (en) Method and device for parallel query of HBase tables
US11601495B2 (en) Mechanism for a work node scan process to facilitate cluster scaling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141224