CN100386761C

CN100386761C - Data file merging method

Info

Publication number: CN100386761C
Application number: CNB2005101145884A
Authority: CN
Inventors: 张亚栋; 赵云飞
Original assignee: Beijing Hollysys Co Ltd
Current assignee: Beijing Helishi System Integration Co Ltd
Priority date: 2005-10-26
Filing date: 2005-10-26
Publication date: 2008-05-07
Anticipated expiration: 2025-10-26
Also published as: CN1746894A

Abstract

The present invention discloses data file merging method. First same records of two data files from current records are searched first, and if the current records of one file are not the same record, the current records in the relevant file and all records between current records and the first same record are copied in an object file, and then the same records are arranged as the new current records of the two data files. Large same records in rest data are merged by using an internal memory comparison mode from the current records, and then the files are merged by using the method circularly. The method of the present invention can quickly, efficiently and reliably merge data files, reduce the amount of calculation of the data file merge, and enhance the stability of system running. The present invention can also be used for merging redundancy double-engine data files.

Description

A kind of data file merging method

Technical field

The present invention relates to the data file treatment technology, specifically, relate to a kind of merging method of data file.

Background technology

At present, a lot of supervisory systems as track traffic synthetic monitoring system, generally all require to realize the dual-computer redundancy configuration, and data such as incident, warning, Operation Log are needed long preservation.In the server of two redundancies, just exist two parts of log files like this.Because some unusual factors such as server maintenance or network failure may cause the record in the two-server inconsistent.And when the operator inquires about, must provide a complete, unique data, therefore, these two parts of files need be merged.

According to actual conditions, redundant as can be seen two-shipper data file has following characteristics: 1, exist big section identical recordings.Under most time, the recorded content of two files is on all four.2, certain file may exist small number of records to omit phenomenon.In time, because system or network, certain file may be omitted one or two record in minority.3, under rare occasion, certain file may lack big section record.Owing to hardware fault or other artificial reasons cause certain station server break-off, thereby cause the record of the big section of this server disappearance.4, the record strip number in it is a lot, may be ten hundreds of.

Now, adopt one by one each bar in the comparison document to write down to realize for the merging of such two one data file more.The calculated amount of this merging method is will be very big and operation time is long, causes that system loading increases in the merging process, influences the stability that total system is moved.Therefore,, great majority big for this data volume write down identical, occur the merging of the data file of the inconsistent or big segment data disappearance of minority data sometimes, how a kind of quick, reliable data file merging method can be provided, become problem demanding prompt solution.

Summary of the invention

Technical matters to be solved by this invention provides a kind of data file merging method, realizes quick, efficient, data file merging reliably, reduces the calculated amount of data file merging.

For solving the problems of the technologies described above, the invention provides a kind of data file merging method, be used to merge first data file and second data file that have a large amount of identical datas, it is characterized in that comprising the steps:

(a) article one that first data file and second data file be set is recorded as current record;

(b) from current record, find in two files from the identical record of this current record article one, current record if any file is not this identical recordings, then with all record copies between current record in the corresponding document and this current record and article one identical recordings in file destination;

(c) current record with two files is updated to the identical record of this article one, calculates the residue record number of the data file that the residue record is few in first and second data files, sets the current relatively quantity smaller or equal to this value;

(d) from current record, take out current relatively a plurality of records of quantity from first and second data files and carry out integral body relatively, if it is identical, execution in step (e), otherwise, a current part that compares quantity as new current relatively quantity, is continued comparison by identical mode, till the comparative result that takes out record from two data files is identical, carry out next step then;

(e) data of being taken out in one of them file are all copied in the file destination, whether the record of judging two data files then copies is finished, and finishes if all copied, then finishes; If all copy is not finished execution in step (f); If have only one of them data file copies to finish, execution in step (g) then;

(f) current record of upgrading in these two data files is taken out next bar record that the last item writes down in the record separately for it, returns step (b);

(g) the residue record copies with another data file arrives file destination, finishes.

Further, said method also can have following characteristics: further may further comprise the steps in the described step (b):

(b1) search and the identical record of the second data file current record since the current record of first data file, if find, execution in step (b2), otherwise execution in step (b3);

(b2) current record as first data file is not this identical recordings, with the whole record copies between the current record in first data file and this current record and the record identical with the second data file current record in file destination, execution in step (c) then;

(b3) search and the identical record of the first data file current record since the current record of second data file, if find, execution in step (b4) then, otherwise, execution in step (b5);

(b4) current record as second data file is not this identical recordings, with the whole record copies between the current record in second data file and this current record and the record identical with the first data file current record in file destination, execution in step (c) then;

(b5) current record with first data file and second data file copies in the file destination, and next bar with this two data files current record writes down as new current record then, returns step (b1).

Further, said method also can have following characteristics: in the described step (c), be that the smaller value in the residue record number of first and second data files is set at current relatively quantity.

Further, said method also can have following characteristics: the integral body in the described step (d) relatively is that internal memory called in many records of first data file and second data file, and internal memory relatively finishes by carrying out.

Further, said method also can have following characteristics: in the described step (d), when integral body result relatively is incomplete same, be that 1/3～2/3 of current relatively quantity is continued relatively as new current relatively quantity.

Further, said method also can have following characteristics: in the described step (d), when integral body result relatively is incomplete same, be that 1/2 of current relatively quantity is continued relatively as new current relatively quantity.

Further, said method also can have following characteristics: in the described step (e), when all copies when not finishing of first data file and second data file are learnt in judgement, carry out following steps earlier: whether the record number of judging file destination counts sum less than first and second data file, if, execution in step (f) again, otherwise make mistakes, finish.

As from the foregoing, the present invention is by searching out initial, the end position of big section identical recordings, disposable merging identical data, the efficient that improved greatly relatively, merges.Further, can also search identical recordings and merge the data that lack, adopt internal memory relatively to wait means, further reduce data and merge the calculated amount of bringing by staggered.

Description of drawings

Figure 1A and Figure 1B are the process flow diagram of embodiment of the invention method, have respectively expressed part steps.

Embodiment

Merging method with the data file of redundant two-shipper in the track traffic synthetic monitoring system is an example below, and the present invention is described in detail, and as shown in Figure 1, the present embodiment method may further comprise the steps:

Step 101: article one that first data file and second data file are set is recorded as current record;

Step 102: calculate the residue record number of first data file and second data file, whether judge wherein less residue record number greater than zero, if, carry out next step, otherwise carry out abnormality processing (not redirect from here under the normal condition), finish;

Step 103: in first data file, search the record identical with the current record of second data file since the current record of first data file, if find, execution in step 104, otherwise, execution in step 105;

Step 104: the current record as first data file is not this identical recordings, with the whole record copies between the current record in first data file and this current record and the record identical with the second data file current record in file destination, execution in step 108;

Step 105: in second data file, search the record identical with the current record of first data file since the current record of second data file, if find, then execution in step 106, otherwise, execution in step 107;

Step 106: the current record as second data file is not this identical recordings, with the whole record copies between the current record in second data file and this current record and the record identical with the first data file current record in file destination, execution in step 108;

Step 107: the current record of first and second data files is copied in the file destination, and next bar with this two data files current record writes down as new current record then, returns step 103;

Step 108: the current record of two data files is updated to identical recordings described in the file separately, recomputates first data file and second data file residue record number, wherein less residue record number is set to current relatively quantity;

Step 109: the current record since two data files, take out current relatively many records of quantity appointment, put into internal memory and carry out integral body relatively, if content is identical, then execution in step 110, otherwise, execution in step 111;

Step 110: many record copies that first data file or second data file are extracted are in file destination, the current record of two data files is updated to next bar record of the last item record that takes out separately respectively, the sequence number that is about to current record adds current relatively quantity, execution in step 112;

Step 111: half of current relatively quantity as new current relatively quantity, returned step 109 and continued relatively;

Step 112: judge whether first data file and second data file have all copied and finish, if, then finish, otherwise, next step carried out;

Step 113, whether judge has a data document copying to finish in first data file and second data file, if, execution in step 115, otherwise, next step carried out;

Step 114, two data files all do not copy to be finished, and judges whether the record number of file destination counts sum less than first and second data file, if then execution in step 102, if not, then makes mistakes, and finishes;

Step 115 all copies another remaining data record that does not have to copy the data file of finishing in the file destination, finishes.

According to the method described above an example is tested, 50000 records are wherein arranged in the data file of A machine, the B machine is 47800 records (during have twice interruption to stop the B machine receive data), adopt the mode that compares one by one, approximately nearly 10 minutes consuming time, and adopt this algorithm, and then less than 10 seconds, the effect highly significant.And, find no loss of data through repeatedly measuring and calculating.

Should be noted that, the merging of the data file in the track traffic synthetic monitoring system that data file merging method of the present invention is not restricted to point out in the embodiment in the redundant two-shipper, in fact, the present invention is applicable to that various data volumes are big, the merging of the data file that most of data are identical, and be specially adapted to wherein to occur sometimes the merging of the data file of the inconsistent or big segment data disappearance of minority data.

The inventive method can be done various possible conversion on the basis of the foregoing description method.

For example, in another embodiment, when current record begins to search article one identical recordings of two data files, might not be confined to staggered method of searching among the embodiment, also can finish by the following method:

Steps A is searched since the current record of first data file, judges whether and the identical record of the second data file current record, if having, has just found the identical record of article one, otherwise, execution in step B;

Step B is updated to new current record with next bar record of the second data file current record, returns steps A and continues to search.

This mode also can realize basic function of the present invention, but, the staggered method of searching that when looking for article one identical recordings, adopts among the embodiment, when in a file, having lacked a big segment record, by searching the identical record of article one that just can find two file current records to begin once or twice.For example, suppose records such as first data file has 6,7......, lacked 1～5 record, its current record is 6.Records such as second data file has 1,2,3,4,5,6,7......, current record is 1.By staggered method of searching, can not find out the current record of second data file in first data file, be that sequence number was 1 when record, meeting goes to search the current record 6 of first data file to second data file, just can find the identical record of both article one by 2 times like this, if and adopt the method in a data file, search, just need in first data file, search 6 times, the identical record of the article one that can find two data files to begin just from current record, thereby staggered efficient of searching mode is higher.

And for example, in another embodiment, when finding the identical record of two data file article one, also this identical recordings can be copied in the file destination, and its next bar record is just passable as the new current record of two data files.The method of this and embodiment is equal to.

And for example, in another embodiment, when setting current relatively quantity, also can not get the residue record number of the file that the residue record is few in two data files, also can get certain numerical value, as 1000～5000 etc. less than this record number.

Claims

1. a data file merging method is used to merge first data file and second data file that have a large amount of identical datas, it is characterized in that comprising the steps:

2. the method for claim 1 is characterized in that, further may further comprise the steps in the described step (b):

3. the method for claim 1 is characterized in that, in the described step (c), is that the smaller value in the residue record number of first and second data files is set at current relatively quantity.

4. the method for claim 1 is characterized in that, the integral body in the described step (d) relatively is that internal memory called in many records of first data file and second data file, and internal memory relatively finishes by carrying out.

5. the method for claim 1 is characterized in that, in the described step (d), when integral body result relatively is incomplete same, is that 1/3～2/3 of current relatively quantity is continued relatively as new current relatively quantity.

6. method as claimed in claim 5 is characterized in that, in the described step (d), when integral body result relatively is incomplete same, is that 1/2 of current relatively quantity is continued relatively as new current relatively quantity.

7. the method for claim 1, it is characterized in that, in the described step (e), when all copies when not finishing of first data file and second data file are learnt in judgement, carry out earlier following steps: whether the record number of judging file destination counts sum less than first and second data file, if, execution in step (f) again, otherwise make mistakes, finish.