CN102043766A - Method and system for modifying scanning document - Google Patents

Method and system for modifying scanning document Download PDF

Info

Publication number
CN102043766A
CN102043766A CN 201010616821 CN201010616821A CN102043766A CN 102043766 A CN102043766 A CN 102043766A CN 201010616821 CN201010616821 CN 201010616821 CN 201010616821 A CN201010616821 A CN 201010616821A CN 102043766 A CN102043766 A CN 102043766A
Authority
CN
China
Prior art keywords
document
collation
character
identification
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010616821
Other languages
Chinese (zh)
Other versions
CN102043766B (en
Inventor
赵海涛
周长岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Founder International Co Ltd
Founder International Beijing Co Ltd
Original Assignee
Founder International Co Ltd
Founder International Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Founder International Co Ltd, Founder International Beijing Co Ltd filed Critical Founder International Co Ltd
Priority to CN201010616821XA priority Critical patent/CN102043766B/en
Publication of CN102043766A publication Critical patent/CN102043766A/en
Application granted granted Critical
Publication of CN102043766B publication Critical patent/CN102043766B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and system for modifying a scanning document, which solves the problem of relatively low accuracy of a method for modifying a scanning document in the prior art. The method provided by the invention comprises the following steps: receiving the identification document of an initial document after optical character recognition (OCR); modifying the identification document, and recording the modification; receiving an emendation document obtained when an emendation user emendates the modified identification document; obtaining the emendation accuracy rate of the emendation user according to the emendation result of the modified content in the emendation document; and judging whether the emendation accuracy rate is larger than the preset value or not, and if the emendation accuracy rate is larger than the preset value, outputting the emendation document. The technical scheme provided by the invention is beneficial to the enhancement of the accuracy for modifying the scanning document.

Description

The method and system that a kind of scanned document is adapted
Technical field
The present invention relates to the method and system that a kind of scanned document is adapted.
Background technology
(Optical Character Recognition OCR) is meant text information is scanned optical character identification, then image file is carried out analyzing and processing, obtains the process of literal and layout information.
Because the limitation of the algorithm of OCR own and the cause for quality of urtext data, OCR obtains the process of Word message and can not accomplish entirely true from the text of scanning, therefore in the work that scanned document is adapted, usually discern by OCR earlier, manually collate by collating the user again, promptly compare, find out in the identification document and be scanned the inconsistent character of document and revise then by the artificial document of OCR being handled identification document afterwards and being scanned.This working method as shown in Figure 1, Fig. 1 is the key step synoptic diagram of the method adapted according to the scanned document of prior art.
According to flow process shown in Figure 1, if it is lower to collate user's collation accuracy, the ratio school of the number of characters of the OCR wrong identification that i.e. collation is found and total number of characters of OCR wrong identification is low, then still might there be more error character in the collation document through this collations user processing, have influenced the accuracy that scanned document is adapted work.
The method accuracy that existing scanned document is adapted is lower, for this problem, does not propose effective solution at present as yet.
Summary of the invention
Fundamental purpose of the present invention provides the method and system that a kind of scanned document is adapted, in order to solve the lower problem of method accuracy that scanned document is adapted in the prior art.
For addressing the above problem, according to an aspect of the present invention, the method that provides a kind of scanned document to adapt.
Scanned document of the present invention is adapted method and is comprised: receive the identification document of original document after optical character identification (OCR); Described identification document is made amendment and record is carried out in this modification; Receive the collation user and amended identification document is collated the collation document that draws; According in the described collation document to the collation result of the content of described modification, draw described collation user's collation accuracy; Whether judge described collation accuracy greater than preset value, if then export described collation document.
Further, described identification document is made amendment comprise: it is other characters that the predeterminated position in described identification document will be discerned correct character change.
Further, described identification document is made amendment comprise: the predeterminated position in described identification document is the character beyond the correct character of this predeterminated position with the character change of identification error.
Further, described identification document is also comprised before making amendment: press the collation accuracy of the described user of collation of character statistics each character; Described identification document made amendment comprise: from described collation user's collation accuracy is lower than the character of preset value, determine one or more characters, the character that all or part of described one or more characters in the described identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
Further, be not more than under the situation of preset value in described collation accuracy, the output information, this information is used to point out described collation user that described collation document is collated once more, and receives the collation document that described collation document is collated once more.
Further, export and comprise after the described collation document: will be content before the described modification through the content recovery of described modification in the described collation document.
For addressing the above problem, according to an aspect of the present invention, the system that provides a kind of scanned document to adapt.
The system that scanned document of the present invention is adapted comprises: first receiver module is used to receive the identification document of original document after optical character identification (OCR); The amendment record module is used for described identification document is made amendment and record is carried out in this modification; Second receiver module is used to receive the collation user and amended identification document is collated the collation document that draws; First statistical module is used for according to the collation result of described collation document to the content of described modification, draws described collation user's collation accuracy; Whether analysis module is used to judge described collation accuracy greater than preset value, if then export described collation document.
Further, also to be used for will discerning correct character change at the predeterminated position of described identification document be other characters to described amendment record module.
Further, described amendment record module also is used at the predeterminated position of described identification document the character change of identification error being the character beyond the correct character of this predeterminated position.
Further, described system also comprises second statistical module, is used for the collation accuracy to each character by the described collation of character statistics user; Described amendment record module also is used for being lower than from described collation user's collation accuracy the character of preset value and determines one or more characters, with the character that all or part of described one or more characters in the described identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
Further, described system also comprises output module, is used to export information, and this information is used to point out described collation user that described collation document is collated once more; Described second receiver module also is used to receive the collation document that described collation document is collated once more.
Further, described system also comprises the recovery module, and being used for described collation document is content before the described modification through the content recovery of described modification.
According to technical scheme of the present invention, whether the mode of the collation accuracy by obtaining the user is investigated and is collated document and can accept, the collation accuracy of having only the user is greater than approving just under the situation of preset value that it collates the result, thereby improved the accuracy that scanned document is adapted.
Description of drawings
Accompanying drawing described herein is used to provide further understanding of the present invention, constitutes the application's a part, and illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute improper qualification of the present invention.In the accompanying drawings:
Fig. 1 is the key step synoptic diagram of the method adapted according to the scanned document of prior art;
Fig. 2 is the key step synoptic diagram of the method adapted according to the scanned document of the embodiment of the invention; And
Fig. 3 is the synoptic diagram of the module of the system that adapts according to the scanned document of the embodiment of the invention.
Embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.
Fig. 2 is the key step synoptic diagram of the method adapted according to the scanned document of the embodiment of the invention, and as shown in Figure 2, this method comprises the steps:
Step S21: receive the identification document of original document after optical character identification (OCR);
Step S22: the identification document is made amendment and record is carried out in this modification;
Step S23: receive the collation user and amended identification document is collated the collation document that draws;
Step S24: the collation result according to collating in the document the content of revising draws the collation accuracy of collating the user;
Step S25: judge and whether collate accuracy,, otherwise enter step S27 if then enter step S26 greater than preset value;
Step S26: document is collated in output;
Step S27: output information, prompting are collated the user and are collated once more collating document.Next can change step S24 over to.
When obtaining user's collation accuracy, in step S22, specifically can adopt the method for two-way scrambling.
In the method for two-way scrambling, a kind of is that will to discern correct character change at the predeterminated position of identification in the document be other characters, like this in step S24, add up these and come out, the number of words of checking out is accounted for process revise the collation accuracy of the ratio of character sum as this collation user through having how many words to be collated customer inspection in the character of revising.
The another kind of method of two-way scrambling is at the predeterminated position of identification in the document character change of identification error to be the character beyond the correct character of this predeterminated position.Because might exist a certain character usually to be erroneously identified as another character among the OCR result, the press corrector may directly search this another character like this, thereby ignore check and correction to other characters, so this another character can be made amendment, change other characters into, these other characters should not be the correct characters of current location, can impel the press corrector that each character is proofreaded like this, rather than directly search the result of those fallibilities.
When scrambling, can take different scrambling strategies at different collation users.For example collate the user and usually can not collate out, just can carry out scrambling at this characteristics of collating user A for the mistake that exists among some OCR result.Specifically can be before step S22, press the character statistics and collate the collation accuracy of user each character, from being lower than the character of preset value, this collation user's collation accuracy determines one or more characters then, with the character that all or part of described one or more characters in the identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.For example " not " often is identified as " end " such mistake, usually collated user A and ignores, and so just can change " not " that correctly identifies in the identification document into " end ", sees whether collation user A checks to draw.
After step S25, may also comprise and do not collated the individual characters of in step S22, revising that customer inspection goes out, therefore can the content recovery of revising among the step S22 be the content before revising according to the record among the step S22.
Fig. 3 is the synoptic diagram of the module of the system that adapts according to the scanned document of the embodiment of the invention.As shown in Figure 3, the system 30 that adapts of scanned document comprises as lower module:
First receiver module is used to receive the identification document of original document after optical character identification (OCR);
The amendment record module is used for described identification document is made amendment and record is carried out in this modification;
Second receiver module is used to receive the collation user and amended identification document is collated the collation document that draws;
First statistical module is used for according to the collation result of described collation document to the content of described modification, draws described collation user's collation accuracy;
Whether analysis module is used to judge described collation accuracy greater than preset value, if then export described collation document.
The amendment record module also is used in the predeterminated position of identification in the document, and will to discern correct character change be other characters.
The predeterminated position that the amendment record module also is used in the described identification document is the character beyond the correct character of this predeterminated position with the character change of identification error.
The system 30 that scanned document is adapted also can comprise second statistical module, is used for collating the collation accuracy of user to each character by the character statistics; The amendment record module also can be used for determining one or more characters from the collation accuracy of collating the user is lower than the character of preset value like this, with the character that all or part of described one or more characters in the identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
The system 30 that scanned document is adapted also can comprise output module, is used to export information, and this information is used to point out described collation user that described collation document is collated once more; Such second receiver module also is used to receive the collation document that described collation document is collated once more.
The system 30 that scanned document is adapted also can comprise the recovery module, is used for being the content before revising with collating document through the content recovery of revising.
From above explanation as can be seen, whether the mode of the collation accuracy by obtaining the user in the present embodiment is investigated and is collated document and can accept, the collation accuracy of having only the user is greater than approving just under the situation of preset value that it collates the result, thereby improved the accuracy that scanned document is adapted.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with the general calculation device, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation element forms, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the memory storage and carry out by calculation element, perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (12)

1.7 the method that scanned document is adapted is characterized in that, comprising:
Receive the identification document of original document after optical character identification (OCR);
Described identification document is made amendment and record is carried out in this modification;
Receive the collation user and amended identification document is collated the collation document that draws;
According in the described collation document to the collation result of the content of described modification, draw described collation user's collation accuracy;
Whether judge described collation accuracy greater than preset value, if then export described collation document.
2. method according to claim 1 is characterized in that, described identification document is made amendment to be comprised: it is other characters that the predeterminated position in described identification document will be discerned correct character change.
3. method according to claim 1 is characterized in that, described identification document is made amendment to be comprised: the predeterminated position in described identification document is the character beyond the correct character of this predeterminated position with the character change of identification error.
4. method according to claim 1 is characterized in that,
Described identification document is also comprised before making amendment: press the collation accuracy of the described user of collation of character statistics each character;
Described identification document made amendment comprise: from described collation user's collation accuracy is lower than the character of preset value, determine one or more characters, the character that all or part of described one or more characters in the described identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
5. according to each described method in the claim 1 to 4, it is characterized in that, be not more than under the situation of preset value in described collation accuracy, the output information, this information is used to point out described collation user that described collation document is collated once more, and receives the collation document that described collation document is collated once more.
6. according to each described method in the claim 1 to 4, it is characterized in that, export and comprise after the described collation document: will be content before the described modification through the content recovery of described modification in the described collation document.
7. the system that scanned document is adapted is characterized in that, comprising:
First receiver module is used to receive the identification document of original document after optical character identification (OCR);
The amendment record module is used for described identification document is made amendment and record is carried out in this modification;
Second receiver module is used to receive the collation user and amended identification document is collated the collation document that draws;
First statistical module is used for according to the collation result of described collation document to the content of described modification, draws described collation user's collation accuracy;
Whether analysis module is used to judge described collation accuracy greater than preset value, if then export described collation document.
8. system according to claim 7 is characterized in that, it is other characters that described amendment record module also is used for will discerning correct character change at the predeterminated position of described identification document.
9. system according to claim 7 is characterized in that, described amendment record module also is used at the predeterminated position of described identification document the character change of identification error being the character beyond the correct character of this predeterminated position.
10. system according to claim 7 is characterized in that,
Described system also comprises second statistical module, is used for the collation accuracy to each character by the described collation of character statistics user;
Described amendment record module also is used for being lower than from described collation user's collation accuracy the character of preset value and determines one or more characters, with the character that all or part of described one or more characters in the described identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
11. according to each described system in the claim 7 to 10, it is characterized in that,
Described system also comprises output module, is used to export information, and this information is used to point out described collation user that described collation document is collated once more;
Described second receiver module also is used to receive the collation document that described collation document is collated once more.
12., it is characterized in that according to each described system in the claim 7 to 10, also comprise the recovery module, being used for described collation document is content before the described modification through the content recovery of described modification.
CN201010616821XA 2010-12-30 2010-12-30 Method and system for modifying scanning document Expired - Fee Related CN102043766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010616821XA CN102043766B (en) 2010-12-30 2010-12-30 Method and system for modifying scanning document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010616821XA CN102043766B (en) 2010-12-30 2010-12-30 Method and system for modifying scanning document

Publications (2)

Publication Number Publication Date
CN102043766A true CN102043766A (en) 2011-05-04
CN102043766B CN102043766B (en) 2012-05-30

Family

ID=43909910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010616821XA Expired - Fee Related CN102043766B (en) 2010-12-30 2010-12-30 Method and system for modifying scanning document

Country Status (1)

Country Link
CN (1) CN102043766B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980604A (en) * 2017-03-30 2017-07-25 理光图像技术(上海)有限公司 Treaty content collates device
CN113420741A (en) * 2021-08-24 2021-09-21 深圳市中科鼎创科技股份有限公司 Method and system for intelligently detecting file modification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1186287A (en) * 1996-11-20 1998-07-01 松下电器产业株式会社 Method and apparatus for character recognition
US5889897A (en) * 1997-04-08 1999-03-30 International Patent Holdings Ltd. Methodology for OCR error checking through text image regeneration
US20060288279A1 (en) * 2005-06-15 2006-12-21 Sherif Yacoub Computer assisted document modification
CN101196792A (en) * 2007-12-28 2008-06-11 宇龙计算机通信科技(深圳)有限公司 Automatic correction method and device for document file

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1186287A (en) * 1996-11-20 1998-07-01 松下电器产业株式会社 Method and apparatus for character recognition
US5889897A (en) * 1997-04-08 1999-03-30 International Patent Holdings Ltd. Methodology for OCR error checking through text image regeneration
US20060288279A1 (en) * 2005-06-15 2006-12-21 Sherif Yacoub Computer assisted document modification
CN101196792A (en) * 2007-12-28 2008-06-11 宇龙计算机通信科技(深圳)有限公司 Automatic correction method and device for document file

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980604A (en) * 2017-03-30 2017-07-25 理光图像技术(上海)有限公司 Treaty content collates device
CN106980604B (en) * 2017-03-30 2019-12-31 理光图像技术(上海)有限公司 Contract content checking device
CN113420741A (en) * 2021-08-24 2021-09-21 深圳市中科鼎创科技股份有限公司 Method and system for intelligently detecting file modification
CN113420741B (en) * 2021-08-24 2021-11-30 深圳市中科鼎创科技股份有限公司 Method and system for intelligently detecting file modification

Also Published As

Publication number Publication date
CN102043766B (en) 2012-05-30

Similar Documents

Publication Publication Date Title
CN1609846B (en) Digital ink annotation process for recognizing, anchoring and reflowing digital ink annotations
JP4661921B2 (en) Document processing apparatus and program
US7539326B2 (en) Method for verifying an intended address by OCR percentage address matching
CN103514238A (en) Sensitive word recognition processing method based on classification searching
CN106101662A (en) A kind of system and method utilizing bar code transmission data
CN102566768A (en) Method and system for automatic character judgment and correction
CN104468107A (en) Method and device for verification data processing
CN111539414B (en) Method and system for character recognition and character correction of OCR (optical character recognition) image
CN102194117A (en) Method and device for detecting page direction of document
CN104536998A (en) Data import method and device
CN102043766B (en) Method and system for modifying scanning document
CN110347709A (en) A kind of construction method and system of regulation engine
CN111126370A (en) OCR recognition result-based longest common substring automatic error correction method and system
US8170290B2 (en) Method for checking an imprint and imprint checking device
CN101980156A (en) Method for automatically extracting email address and creating new email
CN112860957B (en) Method, medium and system for checking fixed value list
CN101833645B (en) Bar code decoding method based on code word combination
CN101272222A (en) Restriction calibration method and device
CN114676229B (en) Technical improvement major repair project file management system and management method
CN102833713A (en) Method and device for distinguishing spam message
CN111783066A (en) Character recognition method, system, computer device and storage medium
US8380690B2 (en) Automating form transcription
JP2019215747A (en) Information processing device and program
CN102968758A (en) Method and system for processing digital watermarking
CN111488327A (en) Data standard management method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120530

Termination date: 20141230

EXPY Termination of patent right or utility model