CN102043766A - Method and system for modifying scanning document - Google Patents
Method and system for modifying scanning document Download PDFInfo
- Publication number
- CN102043766A CN102043766A CN 201010616821 CN201010616821A CN102043766A CN 102043766 A CN102043766 A CN 102043766A CN 201010616821 CN201010616821 CN 201010616821 CN 201010616821 A CN201010616821 A CN 201010616821A CN 102043766 A CN102043766 A CN 102043766A
- Authority
- CN
- China
- Prior art keywords
- document
- collation
- character
- identification
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012986 modification Methods 0.000 claims abstract description 21
- 230000004048 modification Effects 0.000 claims abstract description 21
- 206010034719 Personality change Diseases 0.000 claims description 12
- 238000011084 recovery Methods 0.000 claims description 9
- 230000003287 optical effect Effects 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000012015 optical character recognition Methods 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000007689 inspection Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
Images
Abstract
The invention discloses a method and system for modifying a scanning document, which solves the problem of relatively low accuracy of a method for modifying a scanning document in the prior art. The method provided by the invention comprises the following steps: receiving the identification document of an initial document after optical character recognition (OCR); modifying the identification document, and recording the modification; receiving an emendation document obtained when an emendation user emendates the modified identification document; obtaining the emendation accuracy rate of the emendation user according to the emendation result of the modified content in the emendation document; and judging whether the emendation accuracy rate is larger than the preset value or not, and if the emendation accuracy rate is larger than the preset value, outputting the emendation document. The technical scheme provided by the invention is beneficial to the enhancement of the accuracy for modifying the scanning document.
Description
Technical field
The present invention relates to the method and system that a kind of scanned document is adapted.
Background technology
(Optical Character Recognition OCR) is meant text information is scanned optical character identification, then image file is carried out analyzing and processing, obtains the process of literal and layout information.
Because the limitation of the algorithm of OCR own and the cause for quality of urtext data, OCR obtains the process of Word message and can not accomplish entirely true from the text of scanning, therefore in the work that scanned document is adapted, usually discern by OCR earlier, manually collate by collating the user again, promptly compare, find out in the identification document and be scanned the inconsistent character of document and revise then by the artificial document of OCR being handled identification document afterwards and being scanned.This working method as shown in Figure 1, Fig. 1 is the key step synoptic diagram of the method adapted according to the scanned document of prior art.
According to flow process shown in Figure 1, if it is lower to collate user's collation accuracy, the ratio school of the number of characters of the OCR wrong identification that i.e. collation is found and total number of characters of OCR wrong identification is low, then still might there be more error character in the collation document through this collations user processing, have influenced the accuracy that scanned document is adapted work.
The method accuracy that existing scanned document is adapted is lower, for this problem, does not propose effective solution at present as yet.
Summary of the invention
Fundamental purpose of the present invention provides the method and system that a kind of scanned document is adapted, in order to solve the lower problem of method accuracy that scanned document is adapted in the prior art.
For addressing the above problem, according to an aspect of the present invention, the method that provides a kind of scanned document to adapt.
Scanned document of the present invention is adapted method and is comprised: receive the identification document of original document after optical character identification (OCR); Described identification document is made amendment and record is carried out in this modification; Receive the collation user and amended identification document is collated the collation document that draws; According in the described collation document to the collation result of the content of described modification, draw described collation user's collation accuracy; Whether judge described collation accuracy greater than preset value, if then export described collation document.
Further, described identification document is made amendment comprise: it is other characters that the predeterminated position in described identification document will be discerned correct character change.
Further, described identification document is made amendment comprise: the predeterminated position in described identification document is the character beyond the correct character of this predeterminated position with the character change of identification error.
Further, described identification document is also comprised before making amendment: press the collation accuracy of the described user of collation of character statistics each character; Described identification document made amendment comprise: from described collation user's collation accuracy is lower than the character of preset value, determine one or more characters, the character that all or part of described one or more characters in the described identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
Further, be not more than under the situation of preset value in described collation accuracy, the output information, this information is used to point out described collation user that described collation document is collated once more, and receives the collation document that described collation document is collated once more.
Further, export and comprise after the described collation document: will be content before the described modification through the content recovery of described modification in the described collation document.
For addressing the above problem, according to an aspect of the present invention, the system that provides a kind of scanned document to adapt.
The system that scanned document of the present invention is adapted comprises: first receiver module is used to receive the identification document of original document after optical character identification (OCR); The amendment record module is used for described identification document is made amendment and record is carried out in this modification; Second receiver module is used to receive the collation user and amended identification document is collated the collation document that draws; First statistical module is used for according to the collation result of described collation document to the content of described modification, draws described collation user's collation accuracy; Whether analysis module is used to judge described collation accuracy greater than preset value, if then export described collation document.
Further, also to be used for will discerning correct character change at the predeterminated position of described identification document be other characters to described amendment record module.
Further, described amendment record module also is used at the predeterminated position of described identification document the character change of identification error being the character beyond the correct character of this predeterminated position.
Further, described system also comprises second statistical module, is used for the collation accuracy to each character by the described collation of character statistics user; Described amendment record module also is used for being lower than from described collation user's collation accuracy the character of preset value and determines one or more characters, with the character that all or part of described one or more characters in the described identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
Further, described system also comprises output module, is used to export information, and this information is used to point out described collation user that described collation document is collated once more; Described second receiver module also is used to receive the collation document that described collation document is collated once more.
Further, described system also comprises the recovery module, and being used for described collation document is content before the described modification through the content recovery of described modification.
According to technical scheme of the present invention, whether the mode of the collation accuracy by obtaining the user is investigated and is collated document and can accept, the collation accuracy of having only the user is greater than approving just under the situation of preset value that it collates the result, thereby improved the accuracy that scanned document is adapted.
Description of drawings
Accompanying drawing described herein is used to provide further understanding of the present invention, constitutes the application's a part, and illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute improper qualification of the present invention.In the accompanying drawings:
Fig. 1 is the key step synoptic diagram of the method adapted according to the scanned document of prior art;
Fig. 2 is the key step synoptic diagram of the method adapted according to the scanned document of the embodiment of the invention; And
Fig. 3 is the synoptic diagram of the module of the system that adapts according to the scanned document of the embodiment of the invention.
Embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.
Fig. 2 is the key step synoptic diagram of the method adapted according to the scanned document of the embodiment of the invention, and as shown in Figure 2, this method comprises the steps:
Step S21: receive the identification document of original document after optical character identification (OCR);
Step S22: the identification document is made amendment and record is carried out in this modification;
Step S23: receive the collation user and amended identification document is collated the collation document that draws;
Step S24: the collation result according to collating in the document the content of revising draws the collation accuracy of collating the user;
Step S25: judge and whether collate accuracy,, otherwise enter step S27 if then enter step S26 greater than preset value;
Step S26: document is collated in output;
Step S27: output information, prompting are collated the user and are collated once more collating document.Next can change step S24 over to.
When obtaining user's collation accuracy, in step S22, specifically can adopt the method for two-way scrambling.
In the method for two-way scrambling, a kind of is that will to discern correct character change at the predeterminated position of identification in the document be other characters, like this in step S24, add up these and come out, the number of words of checking out is accounted for process revise the collation accuracy of the ratio of character sum as this collation user through having how many words to be collated customer inspection in the character of revising.
The another kind of method of two-way scrambling is at the predeterminated position of identification in the document character change of identification error to be the character beyond the correct character of this predeterminated position.Because might exist a certain character usually to be erroneously identified as another character among the OCR result, the press corrector may directly search this another character like this, thereby ignore check and correction to other characters, so this another character can be made amendment, change other characters into, these other characters should not be the correct characters of current location, can impel the press corrector that each character is proofreaded like this, rather than directly search the result of those fallibilities.
When scrambling, can take different scrambling strategies at different collation users.For example collate the user and usually can not collate out, just can carry out scrambling at this characteristics of collating user A for the mistake that exists among some OCR result.Specifically can be before step S22, press the character statistics and collate the collation accuracy of user each character, from being lower than the character of preset value, this collation user's collation accuracy determines one or more characters then, with the character that all or part of described one or more characters in the identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.For example " not " often is identified as " end " such mistake, usually collated user A and ignores, and so just can change " not " that correctly identifies in the identification document into " end ", sees whether collation user A checks to draw.
After step S25, may also comprise and do not collated the individual characters of in step S22, revising that customer inspection goes out, therefore can the content recovery of revising among the step S22 be the content before revising according to the record among the step S22.
Fig. 3 is the synoptic diagram of the module of the system that adapts according to the scanned document of the embodiment of the invention.As shown in Figure 3, the system 30 that adapts of scanned document comprises as lower module:
First receiver module is used to receive the identification document of original document after optical character identification (OCR);
The amendment record module is used for described identification document is made amendment and record is carried out in this modification;
Second receiver module is used to receive the collation user and amended identification document is collated the collation document that draws;
First statistical module is used for according to the collation result of described collation document to the content of described modification, draws described collation user's collation accuracy;
Whether analysis module is used to judge described collation accuracy greater than preset value, if then export described collation document.
The amendment record module also is used in the predeterminated position of identification in the document, and will to discern correct character change be other characters.
The predeterminated position that the amendment record module also is used in the described identification document is the character beyond the correct character of this predeterminated position with the character change of identification error.
The system 30 that scanned document is adapted also can comprise second statistical module, is used for collating the collation accuracy of user to each character by the character statistics; The amendment record module also can be used for determining one or more characters from the collation accuracy of collating the user is lower than the character of preset value like this, with the character that all or part of described one or more characters in the identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
The system 30 that scanned document is adapted also can comprise output module, is used to export information, and this information is used to point out described collation user that described collation document is collated once more; Such second receiver module also is used to receive the collation document that described collation document is collated once more.
The system 30 that scanned document is adapted also can comprise the recovery module, is used for being the content before revising with collating document through the content recovery of revising.
From above explanation as can be seen, whether the mode of the collation accuracy by obtaining the user in the present embodiment is investigated and is collated document and can accept, the collation accuracy of having only the user is greater than approving just under the situation of preset value that it collates the result, thereby improved the accuracy that scanned document is adapted.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with the general calculation device, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation element forms, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the memory storage and carry out by calculation element, perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (12)
1.7 the method that scanned document is adapted is characterized in that, comprising:
Receive the identification document of original document after optical character identification (OCR);
Described identification document is made amendment and record is carried out in this modification;
Receive the collation user and amended identification document is collated the collation document that draws;
According in the described collation document to the collation result of the content of described modification, draw described collation user's collation accuracy;
Whether judge described collation accuracy greater than preset value, if then export described collation document.
2. method according to claim 1 is characterized in that, described identification document is made amendment to be comprised: it is other characters that the predeterminated position in described identification document will be discerned correct character change.
3. method according to claim 1 is characterized in that, described identification document is made amendment to be comprised: the predeterminated position in described identification document is the character beyond the correct character of this predeterminated position with the character change of identification error.
4. method according to claim 1 is characterized in that,
Described identification document is also comprised before making amendment: press the collation accuracy of the described user of collation of character statistics each character;
Described identification document made amendment comprise: from described collation user's collation accuracy is lower than the character of preset value, determine one or more characters, the character that all or part of described one or more characters in the described identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
5. according to each described method in the claim 1 to 4, it is characterized in that, be not more than under the situation of preset value in described collation accuracy, the output information, this information is used to point out described collation user that described collation document is collated once more, and receives the collation document that described collation document is collated once more.
6. according to each described method in the claim 1 to 4, it is characterized in that, export and comprise after the described collation document: will be content before the described modification through the content recovery of described modification in the described collation document.
7. the system that scanned document is adapted is characterized in that, comprising:
First receiver module is used to receive the identification document of original document after optical character identification (OCR);
The amendment record module is used for described identification document is made amendment and record is carried out in this modification;
Second receiver module is used to receive the collation user and amended identification document is collated the collation document that draws;
First statistical module is used for according to the collation result of described collation document to the content of described modification, draws described collation user's collation accuracy;
Whether analysis module is used to judge described collation accuracy greater than preset value, if then export described collation document.
8. system according to claim 7 is characterized in that, it is other characters that described amendment record module also is used for will discerning correct character change at the predeterminated position of described identification document.
9. system according to claim 7 is characterized in that, described amendment record module also is used at the predeterminated position of described identification document the character change of identification error being the character beyond the correct character of this predeterminated position.
10. system according to claim 7 is characterized in that,
Described system also comprises second statistical module, is used for the collation accuracy to each character by the described collation of character statistics user;
Described amendment record module also is used for being lower than from described collation user's collation accuracy the character of preset value and determines one or more characters, with the character that all or part of described one or more characters in the described identification document are corresponding respectively to be obtained when being revised as each character by wrong identification.
11. according to each described system in the claim 7 to 10, it is characterized in that,
Described system also comprises output module, is used to export information, and this information is used to point out described collation user that described collation document is collated once more;
Described second receiver module also is used to receive the collation document that described collation document is collated once more.
12., it is characterized in that according to each described system in the claim 7 to 10, also comprise the recovery module, being used for described collation document is content before the described modification through the content recovery of described modification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201010616821XA CN102043766B (en) | 2010-12-30 | 2010-12-30 | Method and system for modifying scanning document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201010616821XA CN102043766B (en) | 2010-12-30 | 2010-12-30 | Method and system for modifying scanning document |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102043766A true CN102043766A (en) | 2011-05-04 |
CN102043766B CN102043766B (en) | 2012-05-30 |
Family
ID=43909910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201010616821XA Expired - Fee Related CN102043766B (en) | 2010-12-30 | 2010-12-30 | Method and system for modifying scanning document |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102043766B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980604A (en) * | 2017-03-30 | 2017-07-25 | 理光图像技术(上海)有限公司 | Treaty content collates device |
CN113420741A (en) * | 2021-08-24 | 2021-09-21 | 深圳市中科鼎创科技股份有限公司 | Method and system for intelligently detecting file modification |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1186287A (en) * | 1996-11-20 | 1998-07-01 | 松下电器产业株式会社 | Method and apparatus for character recognition |
US5889897A (en) * | 1997-04-08 | 1999-03-30 | International Patent Holdings Ltd. | Methodology for OCR error checking through text image regeneration |
US20060288279A1 (en) * | 2005-06-15 | 2006-12-21 | Sherif Yacoub | Computer assisted document modification |
CN101196792A (en) * | 2007-12-28 | 2008-06-11 | 宇龙计算机通信科技(深圳)有限公司 | Automatic correction method and device for document file |
-
2010
- 2010-12-30 CN CN201010616821XA patent/CN102043766B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1186287A (en) * | 1996-11-20 | 1998-07-01 | 松下电器产业株式会社 | Method and apparatus for character recognition |
US5889897A (en) * | 1997-04-08 | 1999-03-30 | International Patent Holdings Ltd. | Methodology for OCR error checking through text image regeneration |
US20060288279A1 (en) * | 2005-06-15 | 2006-12-21 | Sherif Yacoub | Computer assisted document modification |
CN101196792A (en) * | 2007-12-28 | 2008-06-11 | 宇龙计算机通信科技(深圳)有限公司 | Automatic correction method and device for document file |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980604A (en) * | 2017-03-30 | 2017-07-25 | 理光图像技术(上海)有限公司 | Treaty content collates device |
CN106980604B (en) * | 2017-03-30 | 2019-12-31 | 理光图像技术(上海)有限公司 | Contract content checking device |
CN113420741A (en) * | 2021-08-24 | 2021-09-21 | 深圳市中科鼎创科技股份有限公司 | Method and system for intelligently detecting file modification |
CN113420741B (en) * | 2021-08-24 | 2021-11-30 | 深圳市中科鼎创科技股份有限公司 | Method and system for intelligently detecting file modification |
Also Published As
Publication number | Publication date |
---|---|
CN102043766B (en) | 2012-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1609846B (en) | Digital ink annotation process for recognizing, anchoring and reflowing digital ink annotations | |
JP4661921B2 (en) | Document processing apparatus and program | |
US7539326B2 (en) | Method for verifying an intended address by OCR percentage address matching | |
CN103514238A (en) | Sensitive word recognition processing method based on classification searching | |
CN106101662A (en) | A kind of system and method utilizing bar code transmission data | |
CN102566768A (en) | Method and system for automatic character judgment and correction | |
CN104468107A (en) | Method and device for verification data processing | |
CN111539414B (en) | Method and system for character recognition and character correction of OCR (optical character recognition) image | |
CN102194117A (en) | Method and device for detecting page direction of document | |
CN104536998A (en) | Data import method and device | |
CN102043766B (en) | Method and system for modifying scanning document | |
CN110347709A (en) | A kind of construction method and system of regulation engine | |
CN111126370A (en) | OCR recognition result-based longest common substring automatic error correction method and system | |
US8170290B2 (en) | Method for checking an imprint and imprint checking device | |
CN101980156A (en) | Method for automatically extracting email address and creating new email | |
CN112860957B (en) | Method, medium and system for checking fixed value list | |
CN101833645B (en) | Bar code decoding method based on code word combination | |
CN101272222A (en) | Restriction calibration method and device | |
CN114676229B (en) | Technical improvement major repair project file management system and management method | |
CN102833713A (en) | Method and device for distinguishing spam message | |
CN111783066A (en) | Character recognition method, system, computer device and storage medium | |
US8380690B2 (en) | Automating form transcription | |
JP2019215747A (en) | Information processing device and program | |
CN102968758A (en) | Method and system for processing digital watermarking | |
CN111488327A (en) | Data standard management method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120530 Termination date: 20141230 |
|
EXPY | Termination of patent right or utility model |