CN102411569A - Database conversion and cleaning information processing method - Google Patents

Database conversion and cleaning information processing method Download PDF

Info

Publication number
CN102411569A
CN102411569A CN2010102879710A CN201010287971A CN102411569A CN 102411569 A CN102411569 A CN 102411569A CN 2010102879710 A CN2010102879710 A CN 2010102879710A CN 201010287971 A CN201010287971 A CN 201010287971A CN 102411569 A CN102411569 A CN 102411569A
Authority
CN
China
Prior art keywords
update
temp
target matrix
target
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010102879710A
Other languages
Chinese (zh)
Inventor
雷发晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI POPULAR FINANCE INFORMATION TECHNOLOGY Co Ltd
Original Assignee
SHANGHAI POPULAR FINANCE INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI POPULAR FINANCE INFORMATION TECHNOLOGY Co Ltd filed Critical SHANGHAI POPULAR FINANCE INFORMATION TECHNOLOGY Co Ltd
Priority to CN2010102879710A priority Critical patent/CN102411569A/en
Publication of CN102411569A publication Critical patent/CN102411569A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a database conversion and cleaning information processing method, which comprises the following steps of: 1) connecting a target database to a data source; 2) selecting a target data table to be cleaned in the target database; 3) selecting an update mode, executing the fourth step if the incremental update is adopted, and executing the tenth step if the total update is adopted; 4) obtaining the maximum update time last_update in the target data table, and defaulting the last_update as the set time if the target data table is null; 5) screening all records with the update time greater than the last_update in the data source to a temporary table temp_table; 6) deleting repeated records in the temp_table by restraining fields in the target data table; 7) comparing the target data table and the temp_table and obtaining the records of the temp_table in the target data table, and the like. Compared with the prior art, the method has the advantages that the problems of data repetitiveness and omission in the data cleaning process are effectively avoided, the data consistency and the completeness are ensured, and the like.

Description

A kind of database conversion and cleaning information disposal route
Technical field
The present invention relates to a kind of database correlation technique, especially relate to a kind of database conversion and cleaning information disposal route.
Background technology
ETL is also claimed in the cleaning of data and conversion (Extract, Transform, Load), is the problem that often need solve in database field, especially data warehouse field.ETL cleans after being responsible for data that distribute, in the heterogeneous data source such as relation data, flat data file etc. are drawn into interim middle layer, conversion, integrated; Be loaded at last in the object library (data warehouse, Data Mart etc.), become the basis of on-line analytical processing, data mining.
Though the professional tool about data cleansing has much in the market; Like the Datastage of Ascential company, the Powercenter of Informatica company, the ETL Automation of NCR Teradata company etc.; These instruments are mostly powerful, but it uses also comparatively complicated simultaneously.But as general middle-size and small-size application, use these professional tool costs too high, generally can then seek some comparatively instruments of lightweight, like SSIS or directly use storing process programming realization.
Summary of the invention
The object of the invention is exactly for the defective that overcomes above-mentioned prior art existence a kind of database conversion and cleaning information disposal route to be provided.
The object of the invention can be realized through following technical scheme:
A kind of database conversion and cleaning information disposal route is characterized in that, may further comprise the steps:
1) target database is connected to data source;
The target matrix that 2) need clean in the select target database;
3) select update mode, if incremental update, then execution in step 4); If full dose is upgraded, then execution in step 10);
4) obtain last_update update time maximum in the target matrix, if target matrix is empty, then last_update is defaulted as setting-up time;
5) be recorded in a temporary table temp_table greater than all of last_update update time in the garbled data source;
6) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
7) compare through target matrix and temporary table temp_table, obtain being present among the temporary table temp_table record in the target matrix;
8) be present in the record in the target matrix among the rejecting temporary table temp_table;
9) with remaining whole records among the temporary table temp_table, insert in the target matrix, and execution in step 14);
10) with the data of data source one end, be organized as the target data list structure form, and with whole recorded and stored to temporary table temp_table;
11) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
12) empty target matrix;
13) the whole records among the temporary table temp_table are inserted in the target matrix.
14) record upgrades daily record.
Setting-up time in the described step 4) can be on January 1st, 1900.
Described step 6) bind field is one or more.
Described step 11) bind field is one or more.
Compared with prior art, the present invention has the following advantages:
1, specializes the flow process of data-switching and cleaning, can effectively accomplish the Data Update of full dose and two kinds of update modes of increment;
2, use service logic clearly, can effectively avoid the data in the data cleansing process to repeat and the omission problem, guarantee the consistance and the integrality of data.
Description of drawings
Fig. 1 is a process flow diagram of the present invention;
Fig. 2 is a hardware configuration synoptic diagram of the present invention.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is elaborated.
Embodiment
Like Fig. 1, shown in Figure 2, a kind of database conversion and cleaning information disposal route may further comprise the steps:
1) target database 1 is connected to data source 2;
The target matrix that 2) need clean in the select target database 1;
3) select update mode, if incremental update, then execution in step 4); If full dose is upgraded, then execution in step 10);
4) obtain last_update update time maximum in the target matrix, if target matrix is empty, then last_update is defaulted as setting-up time;
5) be recorded in a temporary table temp_table greater than all of last_update update time in the garbled data source 2;
6) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
7) compare through target matrix and temporary table temp_table, obtain being present among the temporary table temp_table record in the target matrix;
8) be present in the record in the target matrix among the rejecting temporary table temp_table;
9) with remaining whole records among the temporary table temp_table, insert in the target matrix, and execution in step 14);
10) with the data of data source one end, be organized as the target data list structure form, and with whole recorded and stored to temporary table temp_table;
11) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
12) empty target matrix;
13) the whole records among the temporary table temp_table are inserted in the target matrix.
14) record upgrades daily record.
Setting-up time in the described step 4) can be on January 1st, 1900.
Described step 6) bind field is one or more.
Described step 11) bind field is one or more.

Claims (4)

1. a database is changed and the cleaning information disposal route, it is characterized in that, may further comprise the steps:
1) target database is connected to data source;
The target matrix that 2) need clean in the select target database;
3) select update mode, if incremental update, then execution in step 4); If full dose is upgraded, then execution in step 10);
4) obtain last_update update time maximum in the target matrix, if target matrix is empty, then last_update is defaulted as setting-up time;
5) be recorded in a temporary table temp_table greater than all of last_update update time in the garbled data source;
6) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
7) compare through target matrix and temporary table temp_table, obtain being present among the temporary table temp_table record in the target matrix;
8) be present in the record in the target matrix among the rejecting temporary table temp_table;
9) with remaining whole records among the temporary table temp_table, insert in the target matrix, and execution in step 14);
10) with the data of data source one end, be organized as the target data list structure form, and with whole recorded and stored to temporary table temp_table;
11) adopt the bind field in the target matrix to reject the duplicate record among the temporary table temp_table;
12) empty target matrix;
13) the whole records among the temporary table temp_table are inserted in the target matrix.
14) record upgrades daily record.
2. a kind of database conversion according to claim 1 and cleaning information disposal route is characterized in that the setting-up time in the described step 4) can be on January 1st, 1900.
3. a kind of database conversion according to claim 1 and cleaning information disposal route is characterized in that described step 6) bind field is one or more.
4. a kind of database conversion according to claim 1 and cleaning information disposal route is characterized in that described step 11) bind field is one or more.
CN2010102879710A 2010-09-20 2010-09-20 Database conversion and cleaning information processing method Pending CN102411569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102879710A CN102411569A (en) 2010-09-20 2010-09-20 Database conversion and cleaning information processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102879710A CN102411569A (en) 2010-09-20 2010-09-20 Database conversion and cleaning information processing method

Publications (1)

Publication Number Publication Date
CN102411569A true CN102411569A (en) 2012-04-11

Family

ID=45913646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102879710A Pending CN102411569A (en) 2010-09-20 2010-09-20 Database conversion and cleaning information processing method

Country Status (1)

Country Link
CN (1) CN102411569A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473375A (en) * 2013-09-29 2013-12-25 方正国际软件有限公司 Data cleaning method and data cleaning system
CN103530375A (en) * 2013-10-15 2014-01-22 北京国双科技有限公司 Method and device for data source matching
CN103593447A (en) * 2013-11-18 2014-02-19 北京国双科技有限公司 Data processing method and device applied to database table
CN107729222A (en) * 2017-07-26 2018-02-23 上海壹账通金融科技有限公司 User behavior statistical method, system, computer equipment and storage medium
WO2018127116A1 (en) * 2017-01-09 2018-07-12 腾讯科技(深圳)有限公司 Data cleaning method and apparatus, and computer-readable storage medium
CN109634971A (en) * 2018-11-07 2019-04-16 平安科技(深圳)有限公司 Data-updating method, device, equipment and computer readable storage medium
CN110147362A (en) * 2019-04-04 2019-08-20 中电科大数据研究院有限公司 One kind is based on the acquisition of event driven DOC DATA and processing system and its method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6208990B1 (en) * 1998-07-15 2001-03-27 Informatica Corporation Method and architecture for automated optimization of ETL throughput in data warehousing applications
CN101075304A (en) * 2006-05-18 2007-11-21 河北全通通信有限公司 Method for constructing decision supporting system of telecommunication industry based on database
CN101183387A (en) * 2007-12-14 2008-05-21 沈阳东软软件股份有限公司 Increment data capturing method and system
CN101504664A (en) * 2009-03-18 2009-08-12 中国工商银行股份有限公司 Apparatus and method for extracting, converting and loading total source data
CN101621529A (en) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 High-efficient and low-cost loading method for heterogeneous mass data
CN101697126A (en) * 2009-10-28 2010-04-21 山东中创软件商用中间件股份有限公司 ETL realization method for incremental data of Excel file

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6208990B1 (en) * 1998-07-15 2001-03-27 Informatica Corporation Method and architecture for automated optimization of ETL throughput in data warehousing applications
CN101075304A (en) * 2006-05-18 2007-11-21 河北全通通信有限公司 Method for constructing decision supporting system of telecommunication industry based on database
CN101183387A (en) * 2007-12-14 2008-05-21 沈阳东软软件股份有限公司 Increment data capturing method and system
CN101621529A (en) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 High-efficient and low-cost loading method for heterogeneous mass data
CN101504664A (en) * 2009-03-18 2009-08-12 中国工商银行股份有限公司 Apparatus and method for extracting, converting and loading total source data
CN101697126A (en) * 2009-10-28 2010-04-21 山东中创软件商用中间件股份有限公司 ETL realization method for incremental data of Excel file

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473375A (en) * 2013-09-29 2013-12-25 方正国际软件有限公司 Data cleaning method and data cleaning system
CN103530375A (en) * 2013-10-15 2014-01-22 北京国双科技有限公司 Method and device for data source matching
CN103593447A (en) * 2013-11-18 2014-02-19 北京国双科技有限公司 Data processing method and device applied to database table
CN103593447B (en) * 2013-11-18 2017-02-08 北京国双科技有限公司 Data processing method and device applied to database table
WO2018127116A1 (en) * 2017-01-09 2018-07-12 腾讯科技(深圳)有限公司 Data cleaning method and apparatus, and computer-readable storage medium
CN108287835A (en) * 2017-01-09 2018-07-17 腾讯科技(深圳)有限公司 A kind of data clearing method and device
US11023448B2 (en) 2017-01-09 2021-06-01 Tencent Technology (Shenzhen) Company Limited Data scrubbing method and apparatus, and computer readable storage medium
CN108287835B (en) * 2017-01-09 2022-06-21 腾讯科技(深圳)有限公司 Data cleaning method and device
CN107729222A (en) * 2017-07-26 2018-02-23 上海壹账通金融科技有限公司 User behavior statistical method, system, computer equipment and storage medium
CN109634971A (en) * 2018-11-07 2019-04-16 平安科技(深圳)有限公司 Data-updating method, device, equipment and computer readable storage medium
CN109634971B (en) * 2018-11-07 2024-01-23 平安科技(深圳)有限公司 Data updating method, device, equipment and computer readable storage medium
CN110147362A (en) * 2019-04-04 2019-08-20 中电科大数据研究院有限公司 One kind is based on the acquisition of event driven DOC DATA and processing system and its method

Similar Documents

Publication Publication Date Title
CN102411569A (en) Database conversion and cleaning information processing method
JP6400010B2 (en) Aggregation / grouping operation: Hardware implementation of filtering method
CN102298607A (en) Schema contracts for data integration
Prekopcsak et al. Radoop: Analyzing big data with rapidminer and hadoop
CN102004744A (en) Data extraction system and method from one source table to table of at least one object database
JP6305406B2 (en) Hardware implementation of filtering / projection operation
CN102171695A (en) Efficient large-scale joining for querying of column based data encoded structures
CN102112986A (en) Efficient large-scale processing of column based data encoded structures
CN102135995A (en) Extract transform and load (ETL) data cleaning design method
CN103544323A (en) Data updating method and device
US20090037386A1 (en) Computer file processing
CA3022050A1 (en) Managing data queries
US20130138730A1 (en) Automated client/server operation partitioning
CN107301214A (en) Data migration method, device and terminal device in HIVE
WO2011126995A1 (en) Columnar storage representations of records
CN104407991A (en) Data storage method and device
CN104298736A (en) Method and device for aggregating and connecting data as well as database system
US10679230B2 (en) Associative memory-based project management system
US8543600B2 (en) Redistribute native XML index key shipping
CN112214453B (en) Large-scale industrial data compression storage method, system and medium
US20220058052A1 (en) Data processing management methods for imaging applications
CN104239580A (en) General single-field split data extraction method and device based on value-column mapping
EP2620901A1 (en) Associative memory-based project management system
CN102411632A (en) Chain table-based memory database page type storage method
CN109992469A (en) A kind of method and device merging log

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20120411

RJ01 Rejection of invention patent application after publication