FIELD OF INVENTION
[0001] The present invention relates to mobile image capture and image processing, and more particularly to capturing and processing digital images using a mobile device, and classifying objects detected in such digital images.
BACKGROUND OF THE INVENTION
[0002] Digital images having depicted therein an object inclusive of documents such as a letter, a check, a bill, an invoice, etc. have conventionally been captured and processed using a scanner or multifunction peripheral coupled to a computer workstation such as a laptop or desktop computer. Methods and. systems capable of performing such capture and processing are well known in the art and well adapted to the tasks for which they are employed.
[0003] However, in an era where day-to-day activities, computing, and business are increasingly performed using mobile devices, it would be greatly beneficial to provide analogous document capture and processing systems and methods for deployment and use on mobile platforms, such as smart phones, digital cameras, tablet computers, etc.
[0004] A major challenge in transitioning conventional document capture and. processing techniques is the limited processing power and image resolution achievable using hardware currently available in mobile devices. These limitations present a significant challenge because it is impossible or impractical to process images captured at resolutions typically much lower than achievable by a conventional scanner. As a result, conventional scanner-based, processing algorithms typically perform poorly on digital images captured using a mobile device.
[0005] In addition, the limited, processing and memory available on mobile devices makes conventional image processing algorithms employed for scanners prohibitively expensive in terms of computational cost. Attempting to process a conventional scanner-based image processing algorithm takes far too much time to be a practical application on modern mobile platforms.
[0006] A still farther challenge is presented by the nature of mobile capture components (e.g. cameras on mobile phones, tablets, etc.). Where conventional scanners are capable of faithfully
representing the physical document in a digital image, critically maintaining aspect ratio, dimensions, and shape of the physical document in the digital image, mobile capture components are frequently incapable of producing such results,
[0007] Specifically, images of documents captured by a camera present a new line of processing issues not encountered when dealing with images captured by a scanner. This is in part due to the inherent differences in the way the document image is acquired, as well as the way the devices are constructed. The way that some scanners work is to use a transport mechanism that creates a relative movement between paper and a linear array of sensors. These sensors create pixel values of the document as it moves by, and the sequence of these captured pixel values forms an image. Accordingly, there is generally a horizontal or vertical consistency up to the noise in the sensor itself, and it is the same sensor that provides ail the pixels in the line.
[0008] In contrast, cameras have many more sensors in a nonlinear array, e.g., typically arranged in a rectangle. Thus, all of these individual sensors are independent, and render image data that is not typically of horizontal or vertical consistency. In addition, cameras introduce a projective effect that is a function of the angle at which the picture is taken. For example, with a linear array like in a scanner, even if the transport of the paper is not perfectly orthogonal to the alignment of sensors and some skew is introduced, there is no projective effect like in a camera. Additionally, with camera capture, nonlinear distortions may be introduced because of the camera optics.
[0009] Conventional image processing algorithms designed to detect documents in images captured using traditional flat-bed and/or paper feed scanners may also utilize information derived from page detection to attempt to classify detected documents as members of a particular document class. However, due to the unique challenges introduced by virtue of capturing digital images using cameras of mobile devices, these conventional classification algorithms perform inadequately and are incapable of robustly classifying documents in such digital images.
[0010] Moreover, even when documents can be properly classified, the hardware limitations of current mobile devices make performing classification using the mobile device prohibitively expensive from a computational efficiency standpoint.
[0011 ] In view of the challenges presented above, it would be beneficial to provide an image capture and processing algorithm and applications thereof that compensate for and/or correct problems associated, with image capture, processing and classification using a mobile device, while maintaining a low computational cost via efficient processing methods.
[0012] Moreover, it would be a further improvement in the field to provide object classification systems, methods and computer program products capable of robustly assigning
objects to a particular class of objects and utilize information known about members of the class to further address and overcome unique challenges inherent to processing images captured using a camera of a mobile device.
SUMMARY OF THE INVENTION
[0013] Irs one embodiment a method includes: receiving a digital image captured by a mobile device; and using a processor of the mobile device: generating a first representa tion of the digital image, the first representation being characterized by a reduced resolution; generating a first feature vector based on the first representation; comparing the first feature vector to a plurality of reference feature matrices; and classifying an object depicted in the digital image as a member of a particular object class based at least in part on the comparing.
[0014] In another embodiment, a method includes: generating a first feature vector based on a digital image captured by a mobile device; comparing the first feature vector to a plurality of reference feature matrices; classifying an object depicted in the digital image as a member of a particular object class based at least in part on the comparing; and determining one or more object features of the object based at least in part on the particular object class; and performing at least one processing operation using a processor of a mobile device, the at least one processing operation selected from a group consisting of: detecting the object depicted, in the digital image based at least in part on the one or more object features; rectangularizing the object depicted in the digital image based at least in part on the one or more object features; cropping the digital image based at least in part on the one or more object features; and binarizing the digital image based at least in part on the one or more object features.
[0015] In still another embodiment, a system includes a processor; and logic in and/or executable by the processor to cause the processor to: generate a first representation of a digital image captured by a mobile device; generate a first feature vector based on the first
representation; compare the first feature vector to a plurality of reference feature matrices; and classify an object depicted in the digital image as a member of a particular object class based at least in part on the comparison.
[0016] In still yet another embodiment, a computer program product includes a computer readable storage medium having program code embodied therewith, the program code readable/executable by a processor to: generate a first representation of a digital image captured by a mobile device; generate a first feature vector based on the first representation; compare the first feature vector to a plurality of reference feature matrices; and classify an object depicted in the digital image as a member of a particular object class based at least in part on the
comparison.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] FIG. 1 illustrates a network architecture, in accordance with, one embodiment.
[0018] FIG. 2 shows a representative hardware environment that may be associated, with the servers and/or clients of FIG. 1, in accordance with one embodiment.
[0019] FIG. 3A depicts a digital image of an object, according to one embodiment.
[0020] FIG, 3B depicts a schematic representation of the digital image shown in FIG. 3A divided into a plurality of sections for generating a first representation of the digital image, according to one embodiment.
[0021 ] FIG. 3C is depicts a first representat on of the digital image shown in FIG. 3A, the first representation being characterized by a reduced resolution relati e to the resolution of the digital image.
[0022] FIG. 4A is a schematic representation of a plurality of subregions depicted in a digital image of a document, according to one embodiment.
[0023] FIG. 4B is a masked representation of the digital image shown in FIG. 4A, according to one embodiment.
[0024] FIG. 4C is a masked representation of the digital image shown in FIG. 4.4, according to one embodiment.
[0025] FIG. 4D is a masked representation of the digital image shown in FIG. 4A, according to one embodiment.
[0026] FIG. 5 is a flowchart of a method, according to one embodiment.
[0027] FIG, 6 is a flowchart of a method, according to one embodiment.
DETAILED DESCRIPTION
[0028] The following description is made for the purpose of illustrating the general principles of the present invention and is not meant to limit the inventive concepts claimed herein. Further, particular features described herein can be used in combination with other described features in each of the various possible combinations and permutations.
[0029] Unless otherwise specifically defined herein, all terms are to be given their broadest possible interpretation including meanings implied from the specification as well as meanings understood by those skilled in the art and/or as defined in dictionaries, treatises, etc.
[0030] It must also be noted that, as used in the specification and the appended claims, the singular forms "a," "an" and "the" include plural referents unless otherwise specified.
[0031] The present application refers to image processing of images (e.g. pictures, figures, graphical schematics, single frames of movies, videos, films, clips, etc.) captured by cameras, especially cameras of mobile devices. As understood herein, a mobile device is any device capable of receiving data without having power supplied via a physical connection (e.g. wire, cord, cable, etc.) and capable of receiving data without a physical data connection (e.g. wire, cord, cable, etc.). Mobile devices within the scope of the present disclosures include exemplar}' devices such as a mobile telephone, smartphone, tablet, personal digital assistant, iPod ®, iPad ®, BLACKBERRY © device, etc,
[0032] However, as it will become apparent from the descriptions of various functionalities, the presently disclosed mobile image processing algorithms can be applied, sometimes with certain modifications, to images coming from scanners and multifunction peripherals (MFPs). Similarly, images processed using the presently disclosed processing algorithms may be further processed using conventional scanner processing algorithms, in some approaches.
[0033] Of course, the various embodiments set forth herein may be implemented, utilizing hardware, software, or any desired combination thereof. For that matter, any type of logic may be utilized which is capable of implementing the various functionality set forth herein.
[0034] One benefit of using a mobile device is that with a data plan, image processing and information processing based on captured images can be done in a much more convenient, streamlined and integrated way than previous methods that relied on presence of a scanner. However, the use of mobile devices as document(s) capture and/or processing devices has heretofore been considered unfeasible for a variety of reasons.
[0035] In one approach, an image may be captured by a camera of a mobile device. The term "camera" should be broadly interpreted to include any type of device capable of capturing an image of a physical object external to the device, such as a piece of paper. The term "camera"
does not encompass a peripheral scanner or multifunction device. Any type of camera may be used. Preferred embodiments may use cameras having a higher resolution, e.g. 8 MP or more, ideally 12 MP or more. The image may be captured in color, grayscale, black and white, or with any other known optical effect. The term "image" as referred to herein is meant to encompass any type of data corresponding to the output of the camera, including raw data, processed data, etc.
[0036] General Embodiments
[0037] In one general embodiment a method includes: receiving a digital image captured by a mobile device; and using a processor of the mobile device: generating a first representation of the digital image, the first representation being characterized by a reduced resolution; generating a first feature vector based on the first representation; comparing the first feature vector to a plurality of reference feature matrices; and classifying an object depicted in the digital image as a member of a particular object class based at least in part on the comparing.
[0038] In another general embodiment, a method includes: generating a first feature vector based on a digital image captured by a mobile device; comparing the first feature vector to a plurality of reference feature matrices; classifying an object depicted in the digital image as a member of a particular object class based at least in part on the comparing; and determining one or more object features of the object based at least in part on the particular object class; and performing at least one processing operation using a processor of a mobile device, the at least one processing operation selected from a group consisting of: detecting the object depicted in the digital image based at least in part on the one or more object features; rectangularizing the object depicted, in the digital image based at least in part on the one or more object features; cropping the digital image based at least in part on the one or more object features; and binarizing the digital image based at least in part on the one or more object features.
[0039] In still another general embodiment, a system includes a processor; and logic in and/or executable by the processor to cause the processor to: generate a first representation of a digital image captured by a mobile device; generate a first feature vector based on the first representation; compare the first feature vector to a plurality of reference feature matrices; and classify an object depicted in the digital image as a member of a particular object class based at least in part on the comparison.
[0040] In still yet another general embodiment, a computer program product includes a computer readable storage medium having program code embodied therewith, the program code readable/executable by a processor to: generate a first representation of a digital image captured by a mobile device; generate a first feature vector based on the first representation; compare the
first feature vector to a plurality of reference feature matrices; and classify an object depicted in the digital image as a member of a particular object class based at least in part on the
comparison.
[0041] As will be appreciated by one skilled in the art, aspects of the present mvention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as "logic," "circuit," "module" or "system." Furthermore, aspects of the present mvention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
[0042] Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non- exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard, disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any- suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, processor, or device.
[0043] A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband, as part of a carrier wave, an electrical connection having one or more wires, an optical fiber, etc. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device,
[0044] Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
[0045] Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Sendee Provider).
[0046] Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the inv ention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flo wchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions acts specified in the flowchart and/or block diagram block or blocks.
[0047] These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
[0048] The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other d evices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
[0049] The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer
program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should, also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed, in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware -based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
[0050] FIG. 1 illustrates an architecture 100, in accordance with one embodiment. FIG. 1, a plurality of remote networks 102 are provided including a first remote network 104 and a second remote network 106. A gateway 101 may be coupled between the remote networks 102 and a proximate network 108. in the context of the present architecture 100, the networks 104, 106 may each take any form including, but not limited to a LAN, a WAN such as the Internet, public switched telephone network (PSTN), internal telephone network, etc,
[0051] In use, the gateway 101 serves as an entrance point from the remote networks 102 to the proximate network 108. As such, the gateway 101 may function as a router, which is capable of directing a given packet of data that aixives at the gateway 101, and a switch, which furnishes the actual path in and out of the gateway 101 for a given packet.
[0052] Further included is at least one data server 114 coupled to the proximate network 108, and which is accessible from the remote networks 102 via the gateway 101. It should be noted that the data server(s) 114 may include any type of computing device/groupware. Coupled, to each data server 114 is a plurality of user devices 116. Such user devices 116 may include a desktop computer, lap-top computer, hand-held computer, printer or any other type of logic. It should be noted that a user device 1 may also be directly coupled to any of the networks, in one embodiment.
[0053] A peripheral 120 or series of peripherals 120, e.g., facsimile machines, printers, networked and/or local storage units or systems, etc., may be coupled to one or more of the networks 104, 106, 108. It should be noted that databases and/or additional components may be utilized with, or integrated into, any type of network element coupled to the networks 104, 1Θ6, 108. In the context of the present description, a network element may refer to any component of a network.
[0054] According to some approaches, methods and systems described herein may be implemented with and/or on virtual systems and/or systems which emulate one or more other systems, such as a UNIX system which emulates an IBM z/OS environment, a UNIX system which virtually hosts a MICROSOFT WINDOWS environment, a MICROSOFT WINDOWS system which emulates an IBM z/OS environment, etc. This virtualization and/or emulation may be enhanced through the use ofVMWARE software, in some embodiments.
[0055] In more approaches, one or more networks 1Θ4, 106, 108, may represent a cluster of systems commonly referred to as a "cloud." In cloud computing, shared resources, such as processing power, peripherals, software, data, servers, etc, are provided to any system in the cloud in an on-demand relationship, thereby allowing access and. distribution of services across many computing systems. Cloud computing typically involves an Internet connection between the systems operating in the cloud, but other techniques of connecting the systems may also be used,
[0056] FIG. 2 shows a representative hardware environment associated with a user device 116 and/or server 114 of FIG. 1, in accordance with one embodiment. Such figure illustrates a typical hardware configuration of a workstation having a central processing unit 210, such as a microprocessor, and a number of other units interconnected via a system bus 212.
[0057] The workstation shown in FIG. 2 includes a Random Access Memory (RAM) 214, Read Only Memory (ROM) 216, an I/O adapter 218 for connecting peripheral devices such as disk storage units 220 to the bus 212, a user interface adapter 222 for connecting a keyboard 224, a mouse 226, a speaker 228, a microphone 232, and/or other user interface devices such as a touch screen and a digital camera (not shown) to the bus 212, communication adapter 234 for connecting the workstation to a communication network 235 (e.g., a data processing network) and. a display adapter 236 for connecting the bus 212 to a display device 238.
[0058] The workstation may have resident thereon an operating system such as the Microsoft Windows® Operating System (OS), a MAC OS, a UNIX OS, etc. It will be appreciated that a preferred embodiment may also be implemented on platforms and operating systems other than those mentioned. A preferred embodiment may be written using JAVA, XML,, C, and/or C÷+ language, or other programming languages, along with an object oriented programming methodology. Object oriented programming (OOP), which has become increasingly used to develop complex applications, may be used,
[0059] An application may be installed on the mobile device, e.g., stored in a nonvolatile memory of the device. In one approach, the application includes instructions to perform processing of an image on the mobile device. In another approach, the application includes
instructions to send the image to a remote server such as a network server. In yet another approach, the application may include instructions to decide whether to perform some or all processing on the mobile device and/or send the image to the remote site.
[0060] Various Embodiments of Page Detection
[0061] One exemplary embodiment illustrating an exemplary methodology for performing page detection will now be described.
[0062] In one approach, an edge detection algorithm proceeds from the boundaries of a digital image toward a central region of the image, looking for points that are sufficiently different from what is known about the properties of the background.
[0063] Notably, the background in the images captured by even the same mobile device may be different every time, so a new technique to identify the document(s) in the image is provided.
[0064] Finding page edges within a camera-captured image according to the present disclosures helps to accommodate important differences in the properties of images captured using mobile devices as opposed, e.g., to scanners. For example, due to projective effects the image of a rectangular document in a photograph may not appear truly rectangular, and opposite sides of the document in the image may not have the same length. Second, even the best lexises have some non-linearity resulting in straight lines within an object, e.g. straight sides of a substantially rectangular document, appearing slightly curved in the captured, image of that object. Third, images captured using cameras overwhelmingly tend to introduce uneven illumination effects in the captured image. This unevenness of illumination makes even a perfectly uniform background of the surface against which a document may be placed appear in the image with varied brightness, and often with shadows, especially around the page edges if the page is not perfectly flat.
[0065] In an exemplary approach, to avoid mistaking the variability within the background for page edges, the current algorithm utilizes one or more of the following functionalities.
[0066] In various embodiments, the frame of the image contains the digital representation of the document with margins of the surrounding background. In the preferred implementation the search for individual page edges may be performed on a step-over approach analyzing rows and columns of the image from outside in. In one embodiment, the step-over approach may define a plurality of analysis windows within the digital image, such as understood herein, analysis windows may include one or more "background windows," i.e. windows encompassing only pixels depicting the background of the digital image, as well as one or more "test windows" i.e. windows encompassing pixels depicting the background of the digital image, the digital representation of the document, or both.
[0067] In a preferred embodiment, the digital representation of the document may be detected in the digital image by defining a first analysis window, i.e. a background analysis window, in a margin of the image corresponding to the background of the surface upon w Inch the document is placed. Within the first analysis window, a plurality of small analysis windows (e.g. test windows) may be defined within the first analysis window. Utilizing the plurality of test windows, one or more distributions of one or more statistical properties descriptive of the background may be estimated.
[0068] With continuing reference to the preferred embodiment discussed immediately above, a next step in detecting boundaries of the digital representation of the document may include defining a plurality of test windows within the digital image, and analyzing the corresponding regions of the digital image. For each test window one or more statistical values descriptive of the corresponding region of the image may be calculated. Further, these statistical values ma be compared to a correspondmg distribution of statistics descriptive of the background.
[0069] In a preferred approach, the plurality of test windows may be defined along a path, particularly a linear path. In a particularly preferred approach, the plurality of test windows may be defined in a horizontal direction and/or a vertical direction, e.g. along rows and columns of the digital image. Moreover, a stepwise progression may be employed to define the test windows along the path and/or between the rows and/or columns. In some embodiments, as will be appreciated by one having ordinary skill in the art upon reading the present descriptions, utilizing a stepwise progression may advantageously increase the computational efficiency of document detection processes.
[0070] Moreover, the magnitude of the starting step may be estimated based on the resolution or pixel size of the image, in some embodiments, but this step may be reduced if advantageous for reliable detection of document sides, as discussed further below.
[0071] In more embodiments, the algorithm estimates the distribution of several statistics descriptive of the image properties found in a large analysis window placed within the background surrounding the document. In one approach a plurality of small windows may be defined within the large analysis window, and distributions of statistics descriptive of the small test windows may be estimated. In one embodiment, large analy sis window is defined in a background region of the digital image, such as a top-left corner of the image.
[0072] Statistics descriptive of the background pixels may include any statistical value that may be generated, from digital image data, such as a minimum value, a maximum value, a median value, a mean value, a spread or range of values, a variance, a standard dev iation, etc. as would be understood by one having ordinary skill in the art upon reading the present
descriptions. Values may be sampled from any data descripti ve of the digital image, such as brightness values in one or more color channels, e.g. red-green-blue or RGB, cyan-magenta, yellow, black or CMYK, hue saturation value or HSV, etc. as would be understood, by one having ordinary skill in the art upon reading the present descriptions.
[0073] In one approach, each of the small analysis windows may comprise a subset of the plurality of pixels within the large analysis window. Moreover, small analysis windows may be of any size and/or shape capable of fitting within the boundaries of large analysis window. In a preferred embodiment, small analysis windows may be characterized by a rectangular shape, and even more preferably a rectangle characterized by being three pixels long in a first direction (e.g. height) and seven pixels long in a second direction (e.g. width). Of course, other small analysis window sizes, shapes, and dimensions are also suitable for implementation in the presently disclosed processing algorithms.
[0074] In one embodiment, test windows may be employed to analyze an image and detect the boundary of a digital representation of a document depicted in the image. Background windows are used for estimation of original statistical properties of the background and/or reestimation of local statistical properties of the background. Reestimation may be necessary and/or advantageous in order to address artifacts such as uneven illumination and/or background texture variations.
[0075] Preferably, statistical estimation may be performed over some or all of a plurality of small analysis window(s) in a large analysis window within the margin outside of the document page in some approaches. Such estimation may be performed using a stepwise movement of a small analysis window within the large analysis window, and the stepwise movement may be made in any suitable increment so as to vary the number of samples taken for a given pixel. For example, to promote computational efficiency, an analysis process may define a number of small analysis windows within large analysis window sufficient to ensure each pixel is sampled once. Thus the plurality of small analysis windows defined in this computationally efficient approach would share common borders but not overlap.
[0076] In another approach designed to promote robustness of statistical estimations, the analysis process may define a number of small analysis windows within large analy sis window sufficient to ensure each pixel is sampled a maximum number of times, e.g. by reducing the step to produce only a single pixefshift in a given direction between sequentially defined small analysis windows. Of course, any step increment may be employed in various embodiments of the presently disclosed processing algorithms, as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[0077] The skilled artisan will appreciate that large analysis windows utilized to reestimate statistics of local background in the digital image as well as test windows can be placed in the digital image in any which way desirable.
[0078] For example, according to one embodiment, the search for the left side edge in a gi ven row i begins from the calculation of the above mentioned statistics in a large analy sis window adjacent to the frame boundary on the left side of the image centered around, a given row .
[0079] In still more embodiments, when encountering a possible non-background test window (e.g. a test window for which the estimated statistics are dissimilar from the distribution of statistics characteristic of the last known local background) as the algorithm progresses from the outer region(s) of the image towards the interior regions thereof, the algorithm may backtrack into a previously determined background region, form a new large analysis window and re- estimate the distribution of background statistics in order to reevaluate the validity of the differences between the chosen statistics within the small analysis window and the local distribution of corresponding statistics within the large analysis window, in some embodiments.
[0080] As will be appreciated by one having ordinary skill in the art upon reading the present descriptions, the algorithm may proceed from an outer region of the image to an inner region of the image in a variety of manners. For example, in one approach the algorithm proceeds defining test windows in a substantially spiral pattern. In other approaches the pattern may be
substantially serpentine along either a vertical or a horizontal direction. In still more approaches the pattern may be a substantially shingled pattern. The pattern may also be defined by a "sequence mask" laid, over part or all of the digital image, such as a checkerboard pattern, a vertically, horizontally, or diagonally striped pattern, concentric shapes, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions. In other embodiments, analysis windows such as large analysis windows and/or small analysis windows may be defined throughout the digital image in a random manner, a pseudo-random manner, stochastically, etc. according to some defined procedure, as would be understood by one having ordinary skill in the art upon reading the present descriptions. The algorithm can proceed with a sequence of test windows in any desirable fashion as long as the path allows to backtrack into known background, and the path covers the whole image with desirable granularity.
[0081] Advantageously, recalculating statistics in this manner helps to accommodate for any illumination drift inherent to the digital image and/or background, which may otherwise result in false identification of non-background points in the image (e.g. outlier candidate edge points)
[0082] In still yet more embodiments, when the difference is statistically valid, the algorithm may jump a certain distance further along its path in order to check again and thus bypass small variations in the texture of the background, such as wood grain, scratches on a surface, patterns of a surface, small shadows, etc, as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[0083] In additional and/or alternative embodiments, after a potential non-background point has been found, the algorithm determines whether the point lies on the edge of the shadow (a possibility especially if the edge of the page is raised abo ve the background surface) and tries to get to the actual page edge. This process relies on the observation that shadows usually darken towards the real edge followed, by an abrupt brightening of the image.
[0084] The above described approach to page edge detection was utilized because the use of standard edge detectors may be unnecessary and even undesirable, for several reasons. First, most standard edge detectors involve operations that are time consuming, and second, the instant algorithm is not concerned with additional requirements like monitoring how thin the edges are, which directions they follow, etc. Even more importantly, looking for page edges does not necessarily involve edge detection per se, i.e. page edge detection according to the present disclosures may be performed in a manner that does not search for a document boundary (e.g. page edge), but rather searches for image characteristics associated with a transition from background to the document. For example, the transition may be characterized by flattening of the off-white brightness levels within a glossy paper, i.e. by changes in texture rather than in average gray or color levels.
[0085] As a result, it is possible to obtain candidate edge points (e.g. candidate edge points) that are essentially the first and the last non-background pixels in each ro and column on a grid. In order to eliminate random outliers (e.g. outlier candidate edge points and to determine which candidate edge points correspond, to each side of the page, it is useful in one approach to analyze neighboring candidate edge points.
[0086] In one embodiment, a "point" may be considered any region within the digital image, such as a pixel, a position between pixels (e.g. a point with fractional coordinates such as the center of a 2-pixel by 2-pixel square) a small window of pixels, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions. In a preferred embodiment, a candidate edge point is associated with the center of a test window (e.g. a 3-pixel x 7 -pixel window) that has been found to be characterized by statistics that are determined to be different from the distribution of statistics descriptive of the local background.
[0087] As understood herein, a "neighboring" candidate edge point, or a "neighboring" pixel is considered to be a point or pixel, respectively, which is near or adjacent a point or pixel of interest (e.g. pixel), e.g. a point or pixel positioned at feast in part along a boundary of the point or pixel of interest, a point or pixel positioned within a threshold distance of the point or pixel of interest (such as within 2, 10, 64 pixels, etc. in a given direction, within one row of the point or pixel of interest, within one column of the point or pixel of interest), etc, as would be understood by one having ordinary skill in the art upon reading the present descriptions. In preferred approaches, the "neighboring" point or pixel may be the closest candidate edge point to the point of interest along a particular direction, e.g. a horizontal direction and/or a vertical direction.
[0088] Each "good" edge point ideally has at least two immediate neighbors (one on each side) and. does not deviate far from a straight line segment connecting these neighbors and the "good" edge point, e.g. the candidate edge point and the at least two immediately neighboring points may be fit to a linear regression, and the result may be characterized by a coefficient of determination (R2) not less than 0.95. The angle of this segment with respect to one or more borders of the digital image, together with its relative location determines whether the edge point is assigned to top, left, right, or bottom side of the page. In a preferred embodiment, a candidate edge point and the two neighboring edge points may be assigned to respective comers of a triangle. If the angle of the triangle at the candidate edge point is close to 180 degrees, then the candidate edge point may be considered a "good" candidate edge point. If the angle of the triangle at the candidate edge point deviates far from 180 degrees by more than a threshold value (such as by 20 degrees or more), then the candidate edge point may be excluded from the set of "good" candidate edge points. The rationale behind, this heuristic is based, on the desire to throw out random errors in the determination of the first and last non-background pixels within rows and. columns. 'These pixels are unlikely to exist in consistent lines, so checking the neighbors in terms of distance and direction is particularly advantageous in some approaches.
[0089] For speed, the step of this grid may start from a large number such as 32, but it may¬ be reduced by a factor of two and. the search for edge points repeated until there are enough of them to determine the Least Mean Squares (IMS) based equations of page sides (see below). If this process cannot determine the sides reliably even after using all rows and columns in the image, it gives up and the whole image is treated as the page.
[0090] The equations of page sides are determined as follows, in one embodiment. First, the algorithm fits the best LMS straight line to each of the sides using the strategy of throwing out worst outliers until all the remaining supporting edges lie within a small distance from the LMS line. For example, a point with the largest distance from a substantially straight line connecting a
plurality of candidate edge points along a particular boundar of the document may be designated the "worst" outlier. This procedure may be repeated iteratrvely to designate and/or remove one or more "worst" outliers from the plurality of candidate edge point. In some approaches, the distance with which a candidate edge point may deviate from the line connecting the pluralit of candidate edge points is based at least in part on the size and/or resolution of the digital image.
[0091] If this line is not well supported all along its stretch, the algorithm may attempt to fit the best second-degree polynomial (parabola) to the same original candidate points. The algorithmic difference between finding the best parabola vs. the best straight line is minor:
instead of two unknown coefficients determining the direction and. offset of the line there are three coefficients determining the curvature, direction, and offset of the parabola; however, in other respects the process is essentially the same, in one embodiment.
[0092] If the support of the parabola is stronger than that of the straight line, especially closer to the ends of the candidate edge span, the conclusion is that the algorithm should prefer the parabola as a better model of the page side in the image. Otherwise, the linear model is employed, in various approaches.
[0093] Intersections of the four found sides of the document may be calculated in order to find, the corners of (possibly slightly curved) page tetragon, (e.g. tetragon and discussed in further detail below), in the preferred implementation in order to do this it is necessary to consider three cases: calculating intersections of two straight lines, calculating intersections of a straight line and. a parabola, and calculating intersections of two parabolas.
[0094] In the first case there is a single solution (since top and bottom page edges stretch mostly horizontally, while left and right page edges stretch mostly vertically, the corresponding LMS lines cannot be parallel) and this solution determines the coordinates of the corresponding page corner.
[0095] The second case, calculating intersections of a straight line and a parabola, is slightly more complicated: there can be zero, one, or two solutions of the resulting quadratic equation. If there is no intersection, it may indicate a fatal problem with page detection, and its result may be rejected. A single solution is somewhat unlikely, but presents no further problems. Two intersections present a choice, in which case the intersection closer to the corresponding corner of the frame is a better candidate - in practice, the other solution of the equation may be very far away from the coordinate range of the image frame.
[0096] The third case, calculating intersections of two parabolas, results in a fourth degree polynomial equation that (in principle) may be solved analytically. However, in practice the
number of calculations necessary to achie ve a solution may be greater than in an approximate iterative algorithm that also guarantees the desired sub-pixel precision.
[0097] One exemplary procedure used for this purpose is described, in detail below with reference to rectangularization of the digital representation of the document, according to one approach.
[0098] There are several constraints on the validity of the resulting target tetragon (e.g. tetragon as discussed in further detail below). Namely, the tetragon is preferably not too small (e.g., below a predefined threshold of any desired value, such as 25% of the total area of the image), the corners of the tetragon preferably do not lie too far outside of the frame of the image (e.g. not more than 100 pixels away), and the corners themselves should preferably be interpretable as top-left, top-right, bottom-left and bottom-right with diagonals intersecting inside of the tetragon, etc. If these constraints are not met, a given page detection result may be rejected, in some embodiments.
[0099] In one illustrative embodiment where the detected tetragon of the digital
representation of the document is valid, the algorithm may determine a target rectangle. Target rectangle width and height may be set to the average of top and bottom sides of the tetragon and the average of left and right sides respectively.
[00100] In one embodiment, if skew correction is performed, the angle of skew of the target rectangle may be set to zero so that the page sides will become horizontal and vertical.
Otherwise, the skew angle may be set to the average of the angles of top and bottom sides to the horizontal axis and those of the left and right sides to the vertical axis.
[00101] In a similar fashion, if crop correction is not performed, the center of the target rectangle may be designated so as to match the average of the coordinates of the four comers of the tetragon; otherwise the center may be calculated so that the target rectangle ends up in the top left of the image frame, in additional embodiments.
[00102] In some approaches, if page detection result is rejected for any reason, some or all steps of the process described herein may be repeated with a smaller step increment, in order to obtain more candidate edge points and, advantageously, achieve more plausible results. In a worst-case scenario where problems persist even with the minimum allowed step, the detected page may be set to the whole image frame and the original image may be left untouched.
[00103] Now with particular reference to an exemplary implementation of the inventive page detection embodiment described, herein, in one approach page detection includes performing a method such . As will be appreciated by one having ordinary skill in the art upon reading the present descriptions, the method may be performed in any environment, including those
described herein and represented in any of the Figures provided with the present disclosures.
[00104] In one embodiment, a plurality of candidate edge points corresponding to a transition from a digital image background to the digital representation of the document are defined,
[00105] In various embodiments, defining the plurality of candidate edge points in operation may include one or more additional operations such as operations -, described below.
[00106] According to one embodiment, a large analysis window (e.g. a large analysis window) and. is defined within the digital image. Preferably, a first large analysis window is defined in a region depicting a pluralit of pixels of the digital image background, but not depicting the non-background (e.g. the digital representation of the document) in order to obtain information characteristic of the digital image background for comparison and contrast to information characteristic of the non-background, (e.g. the digital representation of the document, such as background statistics discussed in further detail below). For example, the first large analysis window may be defined in a corner (such as a top-left corner) of the digital image. Of course, the first large analysis window may be defined in any part of the digital image without departing from the scope of the present disclosures.
[001 7] Moreover, as will be understood by one having ordinary skill in the art upon reading the present descriptions, the large analysis window may be any size and/or characterized by any suitable dimensions, but in preferred embodiments the large analysis window is approximately forty pixels high and approximately forty pixels wide.
[00108] In particularly preferred approaches, the large analysis window may be defined in a corner region of the digital image. For example, a digital image comprises a digital
representation of a document having a plurality of sides and a background. As described above, the large analysis window may be defined in a region comprising a plurality of background pixels and not including pixels corresponding to the digital representation of the document. Moreover, the large analysis window may be defined in the corner of the digital image, in some approaches.
[00109] According to one embodiment, a plurality of small analysis windows may be defined within the digital image, such as within the large analysis window. The small analysis windows may overlap at least in part with one or more other small analysis windows such as to be characterized, by comprising one or more overlap regions . In a preferred approach all possible small analysis windows are defined within the large analysis window. Of course, small analysis windows may be defined within any portion of the digital image, such , and preferably small analysis windows may be defined such that each small analysis window is characterized by a single center pixel.
[00110] In operation, according to one embodiment, one or more statistics are calculated for one or more small analysis windows (e.g. one or more small analysis windows within a large analysis window) and one or more distributions of corresponding statistics are estimated (e.g. a distribution of statistics estimated across a plurality of small analysis windows). In another embodiment, distributions of statistics may be estimated across one or more large analysis window(s) and optionally merged.
[00111] Moreover, values may be descriptive of any feature associated with the background of the digital image, such as background brightness values, background color channel values, background texture values, background tint values, background contrast values, background sharpness values, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions. Moreover still, statistics may include a minimum, a maximum and/or a range of brightness values in one or more co lor channels of the plurality of pixels depicting the digital image background over the plurality of small windows within the large analysis window'.
[00112] In operation, according to one embodiment, one or more distributions of background statistics are estimated. By estimating the distribution(s) of statistics, one may obtain descriptive distribution(s) that characterize the properties of the background, of the digital image within, for example, a large analysis window.
00113] The distribution(s) preferably correspond to the background statistics calculated for each small analysis window, and may include, for example, a distribution of brightness minima, a distribution of brighiness maxima, etc., from which one may obtain distribution statistical descriptors such as the minimum and/or maximum of minimum brightness values, the minimum and/or maximum of minimum brightness values, minimum and/or maximum spread of brightness values, minimum and/or maximum of minimum color channel values, minimum and/or maximum of maximum color channel values, minimum and/or maximum spread of color channel values etc. as would be appreciated by one having ordinary skill in the art upon reading the present descriptions. Of course, any of the calculated background statistics (e.g. for brightness values, color channel values, contrast values, texture values, tint values, sharpness values, etc.) may be assembled into a distribution and any value descriptive of the distribution may be employed without departing from the scope of the present disclosures.
[00114] In operation, according to one embodiment, a large analysis window, such as analysis windowr is defined, within the digital image.
00115] Moreover, windo shapes may be defined by positively setting the boundaries of the window as a portion of the digital image, may be defined by negatively, e.g. by applying a mask
to the digital image and defining the regions of the digital image not masked as the analysis window. Moreover still, windows may be defined according to a pattern, especially in embodiments where windows are negatively defined by applying a mask to the digital image. Of course, other manners for defining the windows may be employed without departing from the scope of the present disclosures.
[00116] In operation, according to one embodiment, one or more statistics are calculated for the analysis window. Moreover, in preferred embodiments each analysis window statistic corresponds to a distribution of background statistics estimated for the large analysis window. For example, in one embodiment maximum brightness corresponds to distribution of background brightness maxima, minimum brightness corresponds to distribution of background brightness minima, brightness spread corresponds to distribution of background brightness spreads, etc. as would be understood by one having ordinar skill in the art upon reading the present
descriptions.
[00117] In operation, according to one embodiment, it is determined whether a statistically significant difference exists between at least one analy sis window statistic and the corresponding distribution of background statistics. As will be appreciated by one having ordinary skill in the art upon reading the present descriptions, determining whether a statistically significant difference exists may be performed using any known statistical significance evaluation method or metric, such as a p-value, a z-test, a chi-squared correlation, etc. as would be appreciated by a skilled artisan reading the present descriptions.
[00118] In operation, according to one embodiment, one or more points (e.g. the centermost pixel or point in the analysis window for which a statistically significant difference exists between a value describing the pixel and the corresponding distribution of background statistics is designated as a candidate edge point. The designating may be accomplished by any suitable method known in the art, such as setting a flag corresponding to the pixel, storing coordinates of the pixel, making an array of pixel coordinates, altering one or more values describing the pixel (such as brightness, hue, contrast, etc.), or any other suitable means.
[00119] According to one embodiment, one or more operations may be repeated one or more times. In a preferred embodiment, a plurality of such repetitions may be performed, wherein each repetition is performed on a different portion of the digital image. Preferably, the repetitions may be performed until each side of the digital representation of the document has been evaluated. In various approaches, defining the analysis windows, may result in a plurality of analysis windows, which share one or more borders, which overlap in whole or in part, and/or which do not share any common border and do not overlap, etc. as would be understood by one having ordinary skill
in the art upon reading the present descriptions.
[00120] In a particularly preferred embodiment, the plurality of repetitions may be performed in a manner directed to reestimate local background statistics upon detecting a potentially non- background window (e.g. a window containing a candidate edge point or a window containing an artifact such as uneven illumination, background texture variation, etc.}.
[00121] In operation, according to one embodiment, four sides of a tetragon are defined based on the plurality of candidate edge points. Preferably, the sides of the tetragon encompass the edges of a digital representation of a document in a digital image. Defining the sides of the tetragon may include, in some approaches, performing one or more least-mean-squares (LMS) approximations.
[00122] In more approaches, defining the sides of the tetragon may include identifying one or more outlier candidate edge points, and removing one or more outlier candidate edge points from the pluralit of candidate edge points. Further, defining the sides of the tetragon may include performing at least one additional LMS approximation excluding the one or more outlier candidate edge points.
[00123] Further still, in one embodiment each side of the tetragon is characterized by an equation chosen from a class of functions, and performing the at least one LMS approximation comprises determining one or more coefficients for each equation, such as best coefficients of second degree polynomials in a preferred implementation. According to these approaches, defining the sides of the tetragon may include determining whether each side of the digital represe tation of the document falls within a given class of functions, such as second degree polynomials or simpler functions such as linear functions instead of second, degree polynomials.
[00124] In preferred approaches, performing method may accurately define a tetragon around the four dominant sides of a document while ignoring one or more deviations from the dominant sides of the document, such as a rip and/or a tab - and.
[00125] Additional and/or alternative embodiments of the presently disclosed tetragon may be characterized by having four sides, and each side being characterized by one or more equations such as the polynomial functions discussed above. For example, embodiments where the sides of tetragon are characterized by more than one equation may involve dividing one or more sides into a plurality of segments, each segment being characterized by an equation such as the polynomial functions discussed above.
[00126] Defining the tetragon may, in various embodiments, alternatively and/or additionally include defining one or more corners of the tetragon. For example, tetragon comers may be defined by calculating one or more intersections between adjacent sides of the tetragon, and
designating an appropriate intersection from the one or more calculated intersections in cases where multiple intersections are calculated. In still more embodiments, defining the comers may include solving one or more equations, wherein each equation is characterized by belonging to a chosen class of functions such as th degree polynomials, etc. as would be understood, by one having ordinary skill in the art upon reading the present descriptions.
[00127] In various embodiments, a corner of the tetragon may be defined by one or more of: an intersection of two curved adjacent sides of the tetragon; an intersection of two substantially straight lines; and an intersection of one substantially straight line and one substantially curved line.
[00128] In operation, according to one embodiment, the digital representation of the document and. the tetragon are output to a display of a mobile device. Outputting may be performed in any manner, and may depend upon the configuration of the mobile device hardware and/or software.
[00129] Moreover, outputting may be performed in various approaches so as to facilitate further processing and/or user interaction with the output. For example, in one embodiment the tetragon may be displayed in a manner designed to distinguish the tetragon from other features of the digital image, for example by displaying the tetragon sides in a particular color, pattern, illumination motif, as an animation, etc, as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00130] Further still, in some embodiments outputting the tetragon and the digital
representation of the document may facilitate a user manually adjusting and/or defining the tetragon in any suitable manner. For example, a user may interact with, the display of the mobile device to translate the tetragon, i.e. to move the location of the tetragon in one or more directions while maintaining the aspect ratio, shape, edge lengths, area, etc. of the tetragon. Additionally and/or alternatively, a user may interact with the display of the mobile device to manually define or adjust locations of tetragon comers, e.g. tapping on a tetragon corner and dragging the corner to a desired location within the digital image, such as a corner of the digital representation of the document.
[00131] Referring again to, one particular example of an ideal result of page detection is depicted, showing the digital representation of the document within the digital image, and having a tetragon that encompasses the edges of the digital representation of the document.
[00132] In some approaches page detection such as described above may include one or more additional and/or alternative operations, such as will be described below.
[00133] In one approach, method may further include capturing one or more of the image data containing the digital representation of the document and audio data relating to the digital
representation of the document. Capturing may he performed using one or more capture components coupled to the mobile device, such as a microphone, a camera, an accelerometer, a sensor, etc. as would be understood, by one having ordinary skill in the art upon reading the present descriptions.
00134] In another approach, method may include defining a new large analysis window and reestimating the distribution of background, statistics for the new large analysis window upon determining that the statistically significant difference exists, i.e. essentially repeating operation and/or in a different region of the digital image near a point where a potentially non-background point has been identified, such as near one of the edges of the document.
[00135] In several exemplary embodiments, a large analysis window may be positioned near or at the leftmost non- background pixel in a row or positioned near or at the rightmost non- background pixel in a row, positioned near or at the topmost non-background pixel in a column, positioned near or at bottommost non-background pixel in a column.
[00136] Approaches involving such reestimation may farther include determining whether the statistically significant difference exists between at least one small analysis window (e.g. a test window) statistic and. the corresponding reestimated distribution of large analysis window statistics, in this manner, it is possible to obtain a higher-confidence determination of whether the statistically significant difference exists, and therefore better distinguish true transitions from the digital image background to the digital representation of the document as opposed to, for example, variations in texture, illumination anomalies, and/or other artifacts within the digital image.
[00137] Moreover, with or without performing reestimation as described above may facilitate the method, avoiding one or more artifacts such as variations in illumination and/or background texture, etc. in the digital image, the artifacts not corresponding to a true transition from the digital image background to the digital representation of the document. In some approaches, avoiding artifacts ma take the form of bypassing one or more regions (e.g. regions characterized by textures, variations, etc. that distinguish the region from the true background) of the digital image.
[00138] In some approaches, one or more regions may be bypassed upon determining a statistically significant difference exists between a statistical distribution estimated for the large analysis window and a corresponding statistic calculated for the small analysis window, defining a new large analysis window near the small analysis window, reestimating the distribution of statistics for the new large analysis window, and determining that the statistically significant difference does not exist between the reestimated statistical distribution and the corresponding
statistic calculated for the small analysis window.
[00139] In other approaches, bypassing may be accomplished by checking another analysis window further along the path and resuming the search for a transition to non-background upon determining that the statistics of this checked window do not differ significantly from the known statistical properties of the background, e.g. as indicated by a test of statistical significance.
[00140] As will be appreciated by the skilled artisan upon reading the present disclosures, bypassing may be accomplished, by checking another analysis window farther along the path.
[00141 ] In still further approaches, page detection may additionally and/or alternatively include determining whether the tetragon satisfies one or more quality control metrics; and rejecting the tetragon upon determining the tetragon does not satisfy one or more of the quality control metrics. Moreover, quality control metrics may include measures such as a LMS support metric, a minimum tetragon area metric, a tetragon corner location metric, and a tetragon diagonal intersection location metric.
[00142] In practice, determining whether the tetragon satisfies one or more of these metrics acts as a check on the performance of method. For example, checks may include determining whether the tetragon covers at least a threshold of the overall digital image area, e.g. whether the tetragon comprises at least 25% of the total image area. Furthermore, checks may include determining whether tetragon diagonals intersect inside the boundaries of the tetragon, determining whether one or more of the LM S approximations were calculated from sufficient data to have robust confidence in the statistics derived therefrom, i.e. whether the LMS approximation has sufficient "support," (such as an approximation calculated from at least five data points, or at least a quarter of the total number of data points, in various approaches), and/or determining whether tetragon corner locations (as defined by equations characterizing each respective side of the tetragon) exist within a threshold distance of the edge of the digital image, e.g. whether tetragon corners are located more than 100 pixels away from an edge of the digital image in a gi ven direction. Of course, other quality metrics and/or checks may be employ ed without departing from the scope of these disclosures, as would be appreciated by one having ordinary skill in the art upon reading the present descriptions.
[00143] In one approach, quality metrics and/or checks may facilitate rejecting siiboptimal tetragon definitions, and further facilitate improving the definition of the tetragon sides. For example, one approach involves receiving an indication that the defining the four sides of the tetragon based on the plurality of candidate edge points failed to define a valid tetragon, i.e. failed to satisfy one or more of the quality control metrics; and redefining the plurality of candidate edge points. Notably, in this embodiment redefining the plurality of candidate edge
points includes sampling a greater number of points within the digital image than a number of points sampled in the prior, failed attempt. This may be accomplished, in one approach, by reducing the step over one or more of rows or columns of the digital image and repeating all the steps of the algorithm in order to analyze a larger number of candidate edge points. The step may be decreased in a vertical direction, a horizontal direction, or both. Of course, other methods of redefining the candidate edge points and/or resampling points within the digital image may be utilized without departing from the scope of the present disclosures.
[00144] Further still, page detection may include designating the entire digital image as the digital representation of the document, particularly where multiple repetitions of method failed to define a valid tetragon, even with significantly reduced step in progression through the digital image analysis. In one approach, designating the entire digital image as the digital representation of the document may include defining image comers as document corners, defining image sides as document sides, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00145] As described herein, the diagonals of the tetragon may be characterized by a first line connecting a calculated top left corner of the tetragon to a calculated bottom right corner of the tetragon, and second, line connecting a calculated, top right comer of the tetragon and a calculated, bottom left corner of the tetragon. Moreover, the first line and the second line preferably intersect inside the tetragon.
[00146] In various approaches, one or more of the foregoing operations may be performed using a processor, and the processor may be part of a mobile device, particularly a mobile device having an integrated camera.
[00147] Rectangul ari zation
[00148] The present descriptions relate to rectanguf arizing a digital representation of a document in a digital image, various approaches to which will be described in detail below.
[00149] In one embodiment, the goal of a rectangularization algorithm is to smoothly transform a tetragon (such as defined above in page detection method) into a rectangle (such). Notably, the tetragon is characterized, by a plurality of equations, each equation corresponding to a side of the tetragon and being selected from a chosen class of functions. For example, each side of the tetragon may be characterized by a first degree polynomial, second degree polynomial, third degree polynomial, etc. as would be appreciated by the skilled artisan upon reading the present descriptions.
[00150] In one approach, sides of the tetragon may be described by equations, and in a preferred embodiment a left side of the tetragon is characterized by a second degree polynomial
equation: x = a?. * y2 +ai * y + ao; a right side of the tetragon is characterized by a second degree polynomial equation: x = 1)2 * y2 + bi * y + bo; a top side of the tetragon is characterized by a second degree polynomial equation: y = C2 * x2 +ci * x + co and a bottom side of the tetragon is characterized by a second degree polynomial equation: y = ώ * x2 +di * x + <afc.
[00151] The description of page rectangularization algorithm presented below utilizes the definition of a plurality of tetragon-based intrinsic coordinate pairs (p, q) within the tetragon, each intrinsic coordinate pair (p, q) corresponding to an intersection of a top -to-bottom curve characterized by an equation obtained from the equations of its left and right sides by combining all corresponding coefficients in a top-to-bottom curve coefficient ratio of p to 1 - p, and a left- to-right curve characterized by an equation obtained from the equations of its top and bottom sides by combining all corresponding coefficients in a left- to-right curve coefficient ratio of q to 1 - q, wherein 0 <p < 1 , and wherein 0 < q < 1.
[00152] In a preferred embodiment where the sides of the tetragon are characterized by second degree polynomial equations, the top-to-bottom curve corresponding to the intrinsic coordinate p will be characterized by the equation: x = ((1 /.> · * a?. ÷ p * bi) * y2 + (( 1 - p) * ai + p * bi) * y + ((1 -p) * ao +p * bo), and the left-to-right curve corresponding to the intrinsic coordinate q will be characterized by the equation: y = ((1 - q) * c2 + q * ώ) * y2 + ((1 - q) * a + q * di) * y + ((l-#)* co÷ q * do). Of course, other equations may characterize any of the sides and/or curves described above, as would be appreciated by one having ordinary skill in the art upon reading the present descriptions.
[00153] For a rectangle, which is a particular case of a tetragon, the intrinsic coordinates become especially simple: within the rectangle, each intrinsic coordinate pair (p, q) corresponds to an intersection of a line parallel to each of a left side of the rectangle and a right side of the rectangle, e.g. a line splitting both top and bottom sides in the proportion ofp to 1 -p: and a line parallel to each of a top side of the rectangle and a bottom side of the rectangle, e.g. a line splitting both top and bottom sides in the proportion of q to 1 - q, wherein 0 < p < 1 , and wherein 0 < q < 1 -
[00154] The goal of the rectangularization algorithm described belowr is to match each point in the rectangularized image to a corresponding point in the original image, and do it in such a way as to transform each of the four sides of the tetragon into a substantially straight line, while opposite sides of the tetragon should become parallel to each other and orthogonal to the other pair of sides; i.e. top and bottom sides of the tetragon become parallel to each other; and left and right sides of the tetragon become parallel to each other and orthogonal to the new top and
bottom. Thus, the tetragon is transformed into a true rectangle characterized by four corners, each corner comprising two straight lines intersecting to form a ninety-degree angle.
[00155] The main idea of the rectangularization algorithm described below is to achieve this goal by, first, calculating rectangle-based intrinsic coordmates (p, q) for each point (not shown) in the rectangularized destination image, second, matching these to the same pair (p, q) of tetragon-based intrinsic coordinates in the original image, third, calculating the coordinates of the intersection of the left-to-right and top-to-bottom curves corresponding to these intrinsic coordinates respectively, and finally, assigning the color or gray value at the found point in the original image to the point.
[00156] Referring now to, which depicts a graphical representation of a first iteration of a page rectangularization algorithm, according to one embodiment. , each point in a digital image may correspond to an intersection of a top-to-bottom curve and a left-to-right curve (a curve may include a straight line, a curved line, e.g. a parabola, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions) corresponding to intrinsic coordinates (such as described above) associated with a point.
[00157] As will become apparent from the present descriptions, rectangularization may involve defining a plurality of such left-to-right fines and top-to-bottom lines.
[00158] Moreover, rectangularization may include matching target rectangle-based coordinates to intrinsic tetragon-based coordinates of the digital representation of the document. This matching may include iterative!}' searching for an intersection of a given left-to-right curve and a given top-to-bottom curve shows the first iteration of an exemplary iterative search within the scope of the present disclosures.
[00159] The iterative search, according to one approach discussed in further detail below with regard to, includes designating a starting point having coordinates (xo, yo), The starting point maybe located anywhere within the digital representation of the document, but preferably is located at or near the center of the target rectangle.
[00160] The iterative search may include projecting the starting point onto one of the two intersecting curves, While the starting point may be projected onto either of the curves,, in one approach the first half of a first iteration in the iterative search includes projecting the starting- point onto the top-to-bottom curve to obtain x-coordinate (xi) of the next point, the projection result represented in by point, which has coordinates (xi, yo). Similarly, in some embodiments the second half of a first iteration in the iterative search includes projecting the point onto the left-to-right curve to obtain y-coordinate (yi) of the next point, the projection result represented in by point, which has coordmates (x yi).
[00161 ] Rectangularization involves transforming the tetragon defined in page detection into a true rectangle. The result of this process is a graphical representation of an output after performing a page rectangularization algorithm, according to one embodiment.
[00162] Further iterations may utilize a similar approach such as described, in further detail belo with respect to and method, in some embodiments.
[00163] A method for modifying one or more spatial characteristics of a digital representation of a document in a digital image may include any of the techniques described herein. As will be appreciated by one having ordinary skill in the art upon reading the present descriptions, method may be performed in any suitable environment, including those shown and/or described in the figures and corresponding descriptions of the present disclosures.
[00164] In one embodiment, a tetragon (such as defined above in page detection method.) is transformed into a rectangle. Notably, the tetragon is characterized by a plurality of equations, each equation corresponding to a side of the tetragon and being selected from a chosen class of functions. For example, each side of the tetragon may be characterized by a first degree polynomial, second degree polynomial, third degree polynomial, etc. as would be appreciated by the skilled artisan upon reading the present descriptions.
[00165] In one embodiment, sides of the tetragon may be described by equations, and in a preferred embodiment a left side of the tetragon is characterized by a second degree polynomial equation: x = a?. * y2 + ai * y + ao; a right side of the tetragon is characterized by a second degree polynomial equation: x = &? * y2 + bi * y + bo; a top side of the tetragon is characterized, by a second degree polynomial equation: y = C2 * x2 + ci * x + co; and a bottom side of the tetragon is characterized by a second degree polynomial equation: y = c * x2 + di * x + do. Moreover, the top-to-bottom curve equation is: x = ((i -p) * a? + p * bi) * y2 + ((1 -p) * i + p * hi) * y + ((i -p)
* ao + p * bo), and the left-to-right curve equation is: y = ((1 - q) * C2 + q * di) * y2 + ((1 - q) * ci + q
* di) * y + ((1 - q) * co + q * do). Of course, other equations may characterize any of the sides and/or curves described above, as would be appreciated by one having ordinary skill in the art upon reading the present descriptions.
[00166] In one embodiment, curves, may be described by exemplary'' polynomial functions fitting one or more of the following general forms.
xi = ii2 * yo2 + u i * yo + w,
where = ( 1 - p) * w + p * bi, and. v.. = (1 - q) * a + q * di, and. where, ω are the coefficients in the equation of the left side of the tetragon, bi are the coefficients in the equation of the right side of the tetragon, ci are the coefficients in the equation of the top side of the tetragon, di are the
coefficients in the equation of the bottom side of the tetragon, and p and q are the tetragon-based intrinsic coordinates corresponding to curves,. In some approaches, the coefficients such as at In, a, di, etc, may be derived from calculations, estimations, and/or determinations achieved in the course of performing page detection, such as a page detection method as discussed, above.
00167] Of course, as would be understood by one having ordinary skill in the art, transforming the tetragon into a rectangle may include one or more additional operations, such as will be described in greater detail below.
[00168] In one embodiment, method additionally and/or alternatively includes stretching one or more regions of the tetragon to achieve a more rectangular or truly rectangular shape.
Preferably, such stretching is performed in a manner sufficiently smooth to avoid introducing artifacts into the rectangle.
00169] In some approaches, transforming the tetragon into a rectangle may include determining a height of the rectangle, a width of the rectangle, a skew angle of the rectangle, and/or a center position of the rectangle. For example, such transforming may include defining a width of the target rectangle as the average of the width of the top side and the width of the bottom side of the tetragon ; defining a height of the target rectangle as the average of the height of the left side and the height of the right side of the tetragon ; defining a center of the target rectangle depending on the desired placement of the rectangle in the image; and defining an angle of skew of the target rectangle, e.g. in response to a user request to deskew the digital representation of the document,
[00170] In some approaches, the transforming may additionally and/or alternatively include generating a rectangularized digital image from the original digital image; determining a p- coordinate and a ^-coordinate for a plurality of points within the rectangularized digital image (e.g. points both inside and outside of the target rectangle) wherein each point located to the left of the rectangle has a /'-coordinate value p < 0, wherein each point located to the right of the rectangle has a .'-coordinate v alue p > 1, wherein each point located above the rectangle has a incoordinate value q < 0, and wherein each point located below the rectangle has a ^-coordinate value q > .
[00171 ] In some approaches, the transforming may additionally and/or alternatively include generating a rectangularized digital image from the original digital image; determining a pair of rectangle-based intrinsic coordinates for each point within the rectangularized digital image; and matching each pair of rectangle-based, intrinsic coordinates to an equivalent pair of tetragon- based intrinsic coordinates within the original digital image.
[00172] In preferred approaches, matching the rectangle-based intrinsic coordinates to the
tetragon-based intrinsic coordinates may include: performing an iterative search for an intersection of the top-to-bottom curve and the left-to-right curve. Moreover, the iterative search may itself include designating a starting point (xo, yo), for example, the center of the target rectangle; projecting the starting point (xo, yo) onto the left-to-right curve: xi = 112 * yo2 + ui * yo + uo; and projecting a next point (xi , yo) onto the top-to-bottom curve: yi = v.? * χ·2 + vi * xi + vo, where m = (1 -p) * at+p * In, and where vi = (1 - q) * ct+ q * dt. Thereafter, the iterative search may include iterative!}' projecting (xk, yk) onto the left-to-right curve: xk+j = m * yk2 + ui * yk + uo: and projecting (xk+i, yk) onto the top-to-bottom curve: \¾-Η = V2 * xk+i2 + v; * Xk+i + vo.
[00173] In still more embodiments, matching the rectangle-based intrinsic coordinates to the tetragon-based intrinsic coordinates may include determining a distance between (xk, yk) and (xk+i, yk+i); determining whether the distance is less than a predetermined threshold: and terminating the iterative search upon determining that the distance is less than the predetermined threshold.
[00174] Various Embodiments of Skew Angle Detection and Correction
[00175] In some embodiments, the image processing algorithm disclosed herein may additionally and/or alternatively include functionality designed to detect and'or correct a skew angle of a digital representation of a document in a digital image. One preferred approach to correcting skew are described below. Of course, other methods of correcting skew within a digital image are within the scope of the these disclosures, as would be appreciated by one having ordinary skill in the art upon reading the present descriptions.
[00176] A digital representation of a document in a digital image may be characterized, by one or more skew angles a As will be appreciated by the skilled artisan reading these descriptions and viewing, horizontal skew angle a represents an angle between a horizontal line and an edge, of the digital representation of the document, the edge, having its longitudinal axis in a substantially horizontal direction (i.e. either the top or bottom edge of the digital representation of the document). Similarly, a may represent an angle between a vertical line and an edge, of the digital representation of the document, the edge, having its longitudinal axis in a substantially vertical direction (i.e. either the left edge or right edge of the digital representation of the document).
[00177] Moreover, the digital representation of the document may be defined by a top edge, a bottom edge, a right edge and a left edge. Each of these edges may be characterized by a substantially linear equation, such that for top edge : y = -tan(a)x + dt; for bottom edge : y = - tan(a)x + db; for right edge : x = tan(a)>' + dr; and for left edge : x = tan(a)y + dl, where dt and
db are the y-intercept of the linear equation describing the top and bottom edges of the digital representation of the document, respectively, and where dr and dl are the x-intercept of the linear equation describing the right and left edges of the digital representation of the document, respectively,
00178] In one approach, having defined the linear equations describing each side of the digital representation of the document, for example a rectangular document, a skew angle thereof may be corrected by setting a = 0, such that for top edge : y = dt; for bottom edge : y = db; for right edge : x = dr; and for left edge : x = dl.
[00179] Various Embodiments of Detecting Ijlumination Problems
[00180] In still more embodiments, the presently described image processing algorithm may include features directed to detecting whether a digital representation of a document comprises one or more illumination problems.
[00181] For example, illumination problems may include locally under- saturated regions of a digital image, when brightness values vary greatly from pixel-to-pixel within image
backgrounds, such as is characteristic of images captured in settings with insufficient ambient and/or provided illumination, and locally over-saturated regions of a digital image, when some areas within the image are washed out, such as within the reflection of the flash.
[00182] One exemplary approach to detecting illumination problems in a digital image including a digital representation of a document are described below, according to one embodiment: and, which depicts a method for determining whether illumination problems exist in a digital representation of a document. As will be appreciated by one having ordinary skill in the art upon reading the present descriptions, method may be performed in any suitable environment, such as those described herein and represented in the various Figures submitted herewith. Of course, other environments may also be suitable for operating method within the scope of the present disclosures, as would be appreciated by the skilled artisan reading the instant specification.
[00183] In one embodiment, the processes include (preferably using a mobile device processor) dividing a tetragon including a digital representation of a document into a plurality of sections, each section comprising a plurality of pixels.
[00184] In more approaches, a distribution of brightness values of each section is determined. As will be understood by one having ordinary skill in the art, the distribution of brightness values may be compiled and/or assembled in any laiown manner, and may be fit to any known standard distribution model, such as a Gaussian distribution, a bimodal distribution, a skewed distribution,
etc.
[00185] Irs still more approaches, a brightness value range of each section is determined. As will be appreciated by one having ordinary skill in the art, a range is defined as a difference between a maximum value and a minimum value in a given distribution. Here the brightness value range would be defined as the difference between the characteristic maximum brightness value in a given section and the characteristic minimum brightness value in the same section.. For example, these characteristic values may correspond to the 2na and 98th percentiles of the whole distribution respectively.
[00186] In many approaches, a variability of brightness values of each section is determined.
[00187] In various approaches, it is determined whether each section is oversaturated. For example, operation may include determining that a region of a digital image depicting a digital representation of a document is oversaturated, according to one embodiment. Determining whether each section is oversaturated may include determining a section oversaturation ratio for each section. Notably, in preferred embodiments each section oversaturation ratio is defined as a number of pixels exhibiting a maximum brightness value in the section di vided by a total number of pixels in the section.
[00188] An unevenly illuminated image may depict or be characterized, by a plurality of dark spots that may be more dense in areas where the brightness level of a corresponding pixel, point or region of the digital image is lower than that of other regions of the image or document, and/or lower than an average brightness level of the image or document. In some embodiments, uneven illumination may be characterized by a brightness gradient, such with a gradient proceeding from a top right corner of the image (near region) to a lower left corner of the image (near region) such that brightness decreases along the gradient with a relatively bright area in the top right comer of the image (near region) and a relatively dark area in the lower left corner of the image (near region).
[00189] In some approaches, determining whether each section is oversaturated may further include determining, for each section, whether the oversaturation level of the section is greater than a predetermined threshold., such as 10%; and. characterizing the section as oversaturated. upon determining that the saturation level of the section is greater than the predetermined threshold. While the presently described embodiment, employs a threshold value of 1 0%, other predetermined threshold oversaturation levels may be employed without departing from the scope of the present descriptions. Notably, the exact value is a matter of visual perception and expert judgment, and may be adjusted and/or set by a user in various approaches.
[00190] In more approaches, it is determined whether each section is undersaturated. For
example, operation may include determining that a region of a digital image depicting a digital representation of a document is undersatarated, according to one embodiment. Determining whether each section is under-saturated may include additional operations such as determining a median variability of the distribution of brightness values of each section; determining whether each median variability is greater than a predetermined variability threshold, such as a median brightness variability of 18 out of a 0-255 integer value range; and determining, for each section, that the section is undersatarated upon determining that the median variability of the section is greater than the predetermined variability threshold. Notably, the exact value is a matter of visual perception and expert judgment, and may be adjusted and/or set by a user in various approaches.
[00191] In one particular approach, determining the variability of the section may include determining a brightness value of a target pixel in the plurality of pixels; calculating a difference between the brightness value of the target pixel and a brightness value for one or more neighboring pixels, each neighboring pixel being one or more (for example, 2) pixels away from the target pixel; repeating the determining and the calculating for each pixel in the plurality of pixels to obtain each target pixel variability; and generating a distribution of target pixel variability values, wherein each target pixel brightness value and target pixel variability value is an integer in a range from 0 to 255. This approach may be implemented, for example, by incrementing a corresponding counter in an array of all possible variability values in a range trom 0 to 255, e.g. to generate a histogram of variability values.
[00192] Notably, when utilizing neighboring pixels in determining the variability of a particular section, the neighboring pixels may be within about two pixels of the target pixel along either a vertical direction, a horizontal direction, or both (e.g. a diagonal direction). Of course, other pixel proximity limits may be employed without departing from the scope of the present invention.
[00193] In some approaches, method may further include removing one or more target pixel variability values from the distribution of target pixel variability values to generate a corrected distribution; and defining a characteristic background variability based on the corrected distribution. For example, in one embodiment generating a corrected distribution and defining the characteristic background variability may include removing the top 35% of total counted values (or any other value sufficient to cover significant brightness changes associated with transitions from the background to the foreground) and define the characteristic background variability based on the remaining values of the distribution, i.e. values taken from a relatively flat background region of the digital representation of the document.
[00194] In more approaches, a number of oversaturated sections is determined. This operation
may include any manner of determining a total number of oversaturated sections, e.g. by incrementing a counter during processing of the image, by setting a flag for each oversaturated section and counting flags at some point during processing, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
00195] In more approaches, a number of undersaturated sections is determined. This operation may include any manner of determining a total number of undersaturated sections, e.g. by incrementing a counter during processing of the image, by setting a flag for each
undersaturated section and counting flags at some point during processing, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00196] In more approaches, it is determined that the digital image is oversaturated upon determining that a ratio of the number of oversaturated sections to the total number of sections exceeds an oversaturation threshold, which may be defined by a user, may be a predetermined value, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00197] In more approaches, it is determined that the digital image is undersaturated upon determining that a ratio of the number of undersaturated sections to the total number of sections exceeds an undersaturation threshold, which may be defined by a user, may be a predetermined value, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions..
[00198] In more approaches, it is determined that the illumination problem exists in the digital image upon determining that the digital image is either undersaturated or oversaturated.
[00199] In still more approaches, method may include one or more additional and/or alternative operations, such as will be described in detail below.
[00200] In one embodiment, method may include performing the following operations, for each section. Defining a section height by dividing the height of the document into a predefined number of horizontal sections; and defining a section width by dividing the width of the document into a predetermined number of vertical sections. In a preferred approach, the section height and width are determined based, on the goal of creating a certain number of sections and. making these sections approximately square by dividing the height of the document into a certain number of horizontal parts and by dividing the width of the document into a certain (possibly different) number of vertical parts.
[00201] Thus, in some embodiments each section is characterized by a section height and width, where the digital image is characterized by an image width w and an image height h, where h > = w, where the section size is characterized by a section width >¾ and a section height
hs where >½· = w/m, where hs - h/n, where m and n are defined so that ws is approximately equal to hs. For example, in a preferred embodiment, m >= 3, n >= 4.
[00202] In another approach, a method for determining whether il umination problems exist in a digital representation of a document, includes the following operations, some or all of which may be performed in any environment described herein and/or represented in the presently- disclosed figures.
[00203] Various Embodiments of Correcting Uneven Illumination
[00204] In some approaches, correcting unevenness of illumination in a digital image includes normalizing an overall brightness level of the digital image. Normalizing overall brightness may transform a digital image characterized by a brightness gradient such as discussed above into a digital image characterized by a relatively flat, even distribution of brightness across the digital image, such . Note that in region is characterized by a significantly more dense distribution of dark spots than region, but in regions, are characterized by substantially similar dark spot density- profiles.
[00205] In accordance with the present disclosures, unevenness of illumination may be corrected. In particular, a method for correcting uneven illumination in one or more regions of the digital image is provided herein for use in any suitable environment, including those described herein and represented in the various figures, among other suitable environments as would be known by one having ordinary skill in the art upon reading the present descriptions.
[00206] In one embodiment, method includes operation where, using a processor, a two- dimensional illumination model is derived from the digital image.
[00207] In one embodiment, the two-dimensional illumination model is applied to each pixel in the digital image.
[00208] In more approaches, the digital image may be divided into a plurality of sections, and some or all of the pixels within a section may be clustered based on color, e.g. brightness values in one or more color channels, median hue values, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions. Moreover, several most numerous clusters may be analyzed, to determine characteristics of one or more possible local backgrounds. In order to designate a c luster as a local background of the section, the number of pixels belonging to this cluster has to exceed a certain predefined threshold, such as a threshold percentage of the total section area.
[00209] In various approaches, clustering may be performed using any known method,, including Markov-chain Monte Carlo methods, nearest neighbor joining, distribution-based clustering such as expectation-maximization, density-based clustering such as density-based
spatial clustering of applications with noise (DBSCAN), ordering points to identify the clustering structure (OPTICS), etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions,
[00210] In one embodiment, method may include determining, for each distribution of color channel values within background clusters, one or more of an average color of the primary background of the corresponding section and an average color of the secondary background of the corresponding section, if one or both exist in the section.
[00211 ] In one embodiment, method includes designating, for each section, either the primary background color or the secondary background color as a local representation of a main background of the digital representation of the document, each local representation being characterized by either the average color of the primary background of the corresponding section or the average color of the secondary background of the corresponding section;
[00212] In one embodiment, method includes fitting a plurality of average color channel values of chosen local representations of the image background to a two-dimensional illumination model. In some approaches, the two-dimensional illumination model is a second- degree polynomial characterized by the equation: v = ox2 + bxy + cy2 + dx + ey + f; where v is an average color channel value for one of the plurality of color channels; a, b, c, d, e, and/ are each unknown parameters of the two- dimensional illumination model, each unknown parameter a, b, c, d, e, and /is approximated using a least-mean-squares approximation, x is x-coordinate of the mid-point pixel in the section, and v is a ^-coordinate of the mid-point pixel in the section.
[00213] In one approach, derivation of the two-dimensional illumination model may include, for a plurality of background, clusters: calculating an average color channel value of each background cluster, calculating a hue ratio of each background cluster, and calculating a median hue ratio for the plurality of background clusters. Moreover, the derivation may also include comparing the hue ratio of each background cluster to the median hue ratio of the plurality of clusters; selecting the more likely of the possible two backgrounds as the local representation of the document background based on the comparison; fitting at least one two-dimensional illumination model to the average channel values of the local representation; and. calculating a plurality of average main background color channel values over a plurality of local
representations.
[00214] The applying of the model may include the calculating of a difference between one or more predicted, background channel values and the average main background color channel values; and adding a fraction of the difference to one or more color channel values for each pixel in the digital image. For example, adding the fraction may involve adding a value in a range from
0 to 1 of the difference, for example, ¾ of the difference, in a preferred embodiment, to the actual pixel value.
[00215] In still more approaches, method may include additional and/or alternative operations, such as those discussed immediately below.
[00216] For example, in one approach method further includes one or more of: determining, for each section, a plurality of color clusters; determining a plurality of numerous color clusters, each numerous color cluster corresponding to high frequency of representation in the section (e.g. the color cluster is one of the clusters with the highest number of pixels in the section belonging to that color cluster) determining a total area of the section; determining a plurality of partial section areas, each partial section area corresponding to an area represented by one the plurality of numerous color clusters; dividing each partial section area by the total area to obtain a cluster percentage area for each numerous color cluster; (e.g. by dividing the number of pixels in the section belonging to numerous color clusters by the total number of pixels in the section to obtain a percentage of a total area of the section occupied by the corresponding most numerous color clusters) andclassirying each numerous color cluster as either a background cluster or a non-background cluster based on the cluster percentage area.
[00217] Notably, in preferred approaches the classifying operation identifies either: no background in the section, a single most numerous background, in the section, or two most numerous backgrounds in the section. Moreover, the classifying includes classifying each belonging to a cluster containing a number of pixels greater than a background threshold as a background pixel. In some approaches, the background, threshold is in a range from 0 to 100% (for example, 15% in a preferred, approach). The background threshold may be defined by a user, may be a predetermined value, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions..
[00218] Various Embodiments of Resolution Estimation
[00219] As a further object of the presently disclosed inventive embodiments, mobile image processing may include a method for estimating resolution of a digital representation of a document. Of course, these methods may be performed in any suitable environment, including those described herein and represented in the various figures presented herewith. Moreover, method may be used in conjunction with any other method described herein, and may include additional and/or alternative operations to those described below, as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00220] In one embodiment, a plurality of connected components of a plurality of non-
background elements are detected in the digital image, in some approaches, the digital image may be characterized as a bi tonal image, i.e. an image containing only two tones, and preferably a black and white image.
[00221] In another embodiment, a plurality of likely characters is determined based on the plurality of connected components. Likely characters may be regions of a digital image characterized by a predetermined number of light-to-dark transitions in a given direction, such as three light-to-dark transitions in a vertical direction as would be encountered for a small region of the digital image depicting a capital letter "E," each light-to-dark transition corresponding to a transition from a background of a document (light) to one of the horizontal strokes of the letter "E." Of course, other numbers of light- to-dark transitions may be employed, such as two vertical and/or horizontal light-to-dark transitions for a letter "o," one vertical light to dark transition for a letter "1," etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00222] In still another embodiment, one or more average character dimensions are determined based on the plurality of likely text characters. As understood herein, the average character dimensions may include one or more of an average character width and an average character height, but of course other suitable character dimensions may be utilized, as would be recognized by a skilled artisan reading the present descriptions.
[00223] In still yet another embodiment, the resolution of the digital image is estimated based on the one or more average character dimensions.
[00224] In further embodiments, method may optionally and/or alternatively include one or more additional operations, such as described below.
[00225] For example, in one embodiment method may further include one or more of:
estimating one or more dimensions of the digital representation of the document based on the estimated, resolution of the digital image; comparing the one or more estimated dimensions of the digital representation of the document to one or more kno wn dimensions of a plurality of known document types; matching the digital representation of the document to one or more of the plurality of known document types based on the comparison; determining whether the match satisfies one or more quality control criteria; and adjusting the estimated resolution of the digital representation of the document based on the known dimensions of the known document type upon determining the match satisfies the one or more quality control criteria. In some approaches, the estimated resolution will only be adjusted if a good, match between the digital representation of the document and one of the known document types has been found.
[00226] In some approaches, the one or more known document types include: a Letter size
document (8.5 x 1 1 inch); a Legal size document (8.5 x 14 inch); an A3 document (1 1.69 x 16.54 inch); an A4 (European Letter size) document (8.27 x 1 1.69 inch); an A5 document (5.83 x 8.27 inch); a ledger/tabloid document (1 1 x 17 inch); a driver license (2.125 x 3.375 inch); a business card (2 x 3.5 inch); a personal check (2.75 x 6 inch); a business check (3 x 7.25 inch); a business check (3 x 8.25 inch); a business check (2.75 x 8.5 inch); a business check (3.5 x 8.5 inch); a business check (3.66 x 8.5 inch); a business check (4 x 8.5 inch); a 2.25-inch wide receipt; and a 3.125 -inch wide receipt.
[00227] In still more approaches, method may further and/or optionally include computing, for one or more connected components, one or more of: a number of on-off transitions within the connected component; (for example transitions from a character to a document background, e.g. transitions from black-to-white, white-to-black, etc. as would be understood by the skilled artisan reading the present descriptions);a black pixel density within the connected component; an aspect ratio of the connected component; and a likelihood that one or more of the connected
components represents a text character based, on one or more of the black pixel density, the number of on-off transitions, and the aspect ratio.
[00228] In still more approaches, method may further and/or optionally include determining a character height of at least two of the plurality of text characters; calculating an average character height based on each character height of the at least two text characters; determining a character width of at least two of the plurality of text characters; calculating an average character width based on each character width of the at least two text characters; performing at least one comparison. Notably, the comparison may be selected from: comparing the average character height to a reference average character height; and comparing the average character width to a reference average character width.
[00229] In such approaches, method may further include estimating the resolution of the digital image based on the at least one comparison, where each of the reference average character height and the reference average character width correspond to one or more reference characters, each reference character being characterized by a known average character width and a known average character height.
[00230] In various embodiments, each reference character corresponds to a digital representation of a character obtained from scanning a representative sample of one or more business documents) at some selected resolution, such as 300 DPI, and each reference character further corresponds to one or more common fonts, such as Arial, Times New Roman, Helvetica, Courier, Courier New, Tahoma, etc. as would be understood by the skilled artisan reading the present descriptions. Of course, representative samples of business documents may be scanned at
other resolutions, so long as the resulting image resolution is suitable for recognizing characters on the document. In some approaches, the resolution must be sufficient to provide a minimum character size, such as a smallest character being no less than 12 pixels in height in one embodiment. Of course, those having ordinary skill in the art will understand, that the minimum character height may vary according to the nature of the image. For example different character heights may be required when processing a grayscale image than when processing a binary (e.g. bitonal) image. In more approaches, characters must be sufficiently large to be recognized, by optical character recognition (OCR).
[00231] In even still more embodiments, method may include one or more of: estimating one or more dimensions of the digital representation of the document based on the estimated resolution of the digital representation of the document: computing an average character width trom the average character dimensions; computing an average character height from the average character dimensions; comparing the average character width to the average character height; estimating an orientation of the digital representation of the document based on the comparison; and matching the digital representation of the document to a known document type based on the estimated dimensions and the estimated, orientation.
[00232] In an alternative embodiment, estimating resolution may be performed in an inverse manner, namely by processing a digital representation of a document to determine a content of the document, such as a payment amount for a digital representation of a check, an addressee for a letter, a pattern of a form, a barcode, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions. Based on the determined content, the digital representation of the document may be determined to correspond to one or more known document types, and utilizing information about the known document type(s), the resolution of the digital representation of the document may be determined and/or estimated.
[00233] Various Embodiments of Blur Detection
[00234] A method for detecting one or more blurred regions in a digital image will be described, according to various embodiments. As will be understood and appreciated by the skilled artisan upon reading the present descriptions, method may be performed in any suitable environment, such as those discussed herein and represented in the multitude of figures submitted herewith. Further, method may be performed in isolation and/or in conjunction with any other operation of any other method described herein, including but not limited to image.
[00235] In one embodiment, method includes operation, where, using a processor, a tetragon comprising a digital representation of a document in a digital image is divided into a plurality of sections, each section comprising a plurality of pixels.
[00236] In one embodiment, method includes operation, where, for each section it is determined whether the section contains one or more sharp pixel-to-pixel transitions in a first direction
[00237] In one embodiment, method includes operation, where, for each section a total number of first direction sharp pixel-to-pixei transitions (Ssi) are counted.
[00238] In one embodiment, method inc3ud.es operation, where, for each section it is determined whether the section contains one or more blurred pixel-to-pixel transitions in the first direction.
[00239] In one embodiment, method includes operation, where, for each section a total number of first-direction blurred pixel-to-pixel transitions (Sin) are counted.
[00240] In one embodiment, method includes operation, where, for each section it is determined whether the section contains one or more sharp pixel-to-pixei transitions in a second direction.
[00241] In one embodiment, method includes operation, where, for each section a total number of second direction sharp pixel-to-pixel transitions (Ss?.) are counted.
[00242] In one embodiment, method includes operation, where, for each section, it is determined, whether the section contains one or more blurred pixel-to-pixel transitions in the second direction
[00243] In one embodiment, for each section, a total number of second-direction blurred pixel-to-pixe3 transitions (Sm) are counted.
[00244] In one embodiment, for each section, it is determined that the section is blank upon determining: Ssi is less than a predetermined sharp transition threshold, Ssi is less than a predetermined blurred transition thresho3d, Ssi is less than a predetermined, sharp transition threshold, and SJ?2 is less than a predetermined blurred transition threshold.
[00245] In one embodiment, for each non- blank section, a first direction blur ratio n = Ssi / Ssi is determined.
[00246] In one embodiment, for each non-b3ank section, a second direction blur ratio n = Ss2 / SB2 is determined..
[00247] In one embodiment, for each non-blank section, it is determined that the non-blank section is blurred in the first direction upon detennming that n is less than a predefined section blur ratio threshold.
[00248] In one embodiment, for each no -blank section, it is determined that the non- blank section is blurred in the second direction upon determining that ri is less than the predefined section blur ratio threshold.
[00249] In some approaches a "first direction" and "second direction" may be characterized as perpendicuiar, e.g. a vertical direction and a horizontal direction, or perpendicular diagonals of a square. In other approaches, the "first direction" and "second direction" may correspond to any path traversing the digital image, but preferably each corresponds to a linear path traversing the digital image. A person having ordinary skill in the art reading the present descriptions will appreciate that the scope of the inventive embodiments disclosed herein should not be limited to only these examples, but rather inclusive of any equivalents thereof known in the art.
[00250] In one embodiment, for each non-blank section, it is determined that the non-blank section is blurred upon determining one or more of: the section is blurred in the first direction, and. the section is blurred in the section direction.
[00251] In one embodiment, a total number of blurred sections is determined.
[00252] In one embodiment, an image blur ratio R defined as: the total number blurred sections divided by a total number of sections; is calculated.
[00253] In one embodiment, method includes operation, where, it is determined that the digital image is blurred upon determining the image blur ratio is greater than a predetermined image blur threshold.
[00254] In various embodiments, method may include one or more additional and/or alternative operations, such as described below. For example, in one embodiment, method may also inc lude determining, for each section a distribution of brightness values of the plurality of pixels; determining a characteristic variability v of the distribution of brightness values;
calculating a noticeable brightness transition threshold η based on v(for example, η = 3 * v, but not more than a certain value, such as 16): calculating a large brightness transition threshold μ based on //(for example μ = 2 * .</, but not more than a certain value, such as half of the brightness range); analyzing, for each pixel within the plurality of pixels, a directional pattern of brightness change in a window surrounding the pixel; (for example, horizontally, vertically, diagonally, etc.) and identifying one or more of: the sharp pixel-to-pixel transition and the blurred pixel-to-pixel transitions based on the analysis.
[00255] In another embodiment, method, may also include defining a plurality of center pixels; sequentially analyzing each of the plurality of center pixels within one or more small windows of pixels surrounding the center pixel; such as two pixels before and after; identifying the sharp pixel-to-pixel transitio upon determining: the large brightness transition exists within an immediate vicinity of the center pixel, (for example, from the immediately preceding pixel to the one following), a first small (e.g. smaller than noticeable) brightness variation exists before the large brightness transition; and a second small brightness variation exists after the large
brightness transition; detecting the sharp pixel-to-pixel transition upon determining: the large transition exists within one or more of the small windows, a monotonic change in brightness exists in the large transition; and detecting the blurred pixel-to-pixel transition upon determining: the noticeable transition occurs within a small window; and the monotonic change in brightness exists in the noticeable transition.
[00256] In still another embodiment, method may also include, for each section: counting a total number of sharp transitions in each of one or more chosen directions; counting a total number of blurred transitions in each chosen direction; determining that a section is blank upon determining: the total number of sharp transitions is less than a predefined sharp transition threshold (for example, 50); and the total number of blurred transitions is less than a predefined blurred transition threshold; determining the non- blank section is blurred upon determining a section blurriness ratio comprising the total number of sharp transitions to the total number of blurred transitions is less than a section blur ratio threshold (for example, 24%) in at least one of the chosen directions; and determining that the section is sharp upon determining the section is neither blank nor blurred.
[00257] In yet another embodiment, method may also include determining a total number of blank sections within the plurality of sections (Nbiank); determining a total number of blurred sections within the plurality of sections (Nbtur); determining a total number of sharp sections within the plurality of sections (Nsh∞p); determining a blurriness ratio ( g) = Ni (ΝΜ∞·+ NsiwP}; and determining that the digital image is sharp if the RB is less than a blurriness threshold (preferably expressed as a percentage, for example 30%).
[00258] It will further be appreciated that embodiments presented herein may be provided in the form of a service deployed on behalf of a customer to offer service on demand.
[00259] Documen t Classifi cation
[00260] In accordance with one inventive embodiment commensurate in scope with the present disclosures, FIG. 5, a method 500 is shown. The method 500 may be carried out in any desired environment, and may include embodiments and/or approaches described in relation to FIGS. 1-4D, among others. Of course, more or less operations than those shown in FIG. 5 may be performed in accordance method 500 as would be appreciated by one of ordinary skill in the art upon reading the present descriptions.
[00261] In operation 502, a digital image captured by a mobile device is received.
[00262] In one embodiment the digital image may be characterized by a native resolution. As understood herein, a "native resolution" may be an original, nati ve resolution of the image as originally captured, but also may be a resolution of the digital image after performing some pre-
classification processing such as any of the image processing operations described herein, in one embodiment, the native resolution is approximately 500 pixels by 600 pixels (i.e. a 500x600 digital image) for a digital image of a driver license subjected to processing by virtual rescan (VRS) before performing classification. Moreover, the digital image may be characterized as a color image in some approaches, and in still more approaches may be a cropped-coior image, i.e. a color image depicting substantially only the object to be classified, and not depicting image background.
[00263] In operation 504, a first representation of the digital image is generated using a processor of the mobile device. The first represe tation may be characterized by a reduced resolution, in one approach. As understood herein, a "reduced resolution" may be any resolution less than the native resolution of the digital image, and more particularly any resolution suitable for subsequent analysis of the first representation according to the principles set forth herein.
[00264] In preferred embodiments, the reduced resolution is sufficiently low to minimize processing overhead and maximize computational efficiency and robustness of performing the algorithm on the respective mobile device, host device and/or server platform. For example, in one approach the first represe tation is characterized by a resolution of about 25 pixels by 25 pixels, which has been experimentally determined to be a particularly efficient and robust reduced resolution for processing of relatively small documents, such as business cards, driver licenses, receipts, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00265] Of course, in other embodiments, different resolutions may be employed without departing from the scope of the present disclosure. For example, classification of larger documents or objects may benefit from utilizing a higher resolution such as 50 pixels by 50 pixels, 100 pixels by 100 pixels, etc. to better represent the larger document or object for robust classification and maximum computational efficiency. The resolution utilized may or may not have the same number of pixels in each dimension. Moreover, the most desirable resolution for classifying various objects within a broad range of object classes may be determined
experimentally according to a user's preferred balance between computational efficiency and classification robustness. In still more embodiments, any resolution may be employed, and preferably the resolution is characterized by comprising between 1 pixel and about 1000 pixels in a first dimension, and between 1 and about 1000 pixels in a second dimension,
[00266] One exemplar}' embodiment of inputs, outputs and/or results of a process flow for generating the first representation will now be presented with particular reference to FIGS. 3A- 3C, which respectively depict: a digital image before being divided into sections (e.g. digital
image 300 FIG. 3 A); a digital image divided into sections (e.g. sections 304 FIG. 3B); and a first representation of the digital image (e.g. representation 310 FIG. 3C) characterized by a reduced resolution.
[00267] FIGS. 3A-3B, a digital image 300 captured, by a mobile device may be divided into a plurality of sections 304. Each section may comprise a plurality of pixels 306, which may comprise a substantially rectangular grid of pixels such that the section has dimensions of ps(x) horizontal pixels (ps(x) = 4 FIG. 3B) by ps(y) vertical pixels (ps(yj = 4 FIG. 3B),
[00268] In one general embodiment, a first representation may be generated by dividing a digital image R (having a resolution ofxs. pixels by yn pixels) into Sx horizontal sections and Sy vertical sections and thus may be characterized by a reduced resolution r of & pixels by Sy pixels. Thus, generating the first representation essentially includes generating a less-granular represe tation of the digital image.
[00269] For example, in one approach the digital image 300 is divided into S sections, each section 304 corresponding to one portion of an s-by-s grid 302. Generating the first
representation involves generating a s-pixel-by-s-pixel first representation 310, where each pixel 312 in the first representation 310 corresponds to one of the S sections 304 of the digital image, and. wherein each pixel 312 is located in a position of the first represe tation 310 corresponding to the location of the corresponding section 304 in the digital image, i.e. the upper-leftmost pixel 312 in the first representation corresponds to the upper-leftmost section 304 in the digital image, etc.
[00270] Of course, other reduced resolutions may be employed for the first representation, ideally but not necessarily according to limitations and/or features of a mobile device, host device, and or server platform being utilized, to cany out the processing, the characteristics of the digital image (resolution, illumination, presence of blur, etc.) and/or characteristics of the object which is to be detected and/or classified (contrast with background, presence of text or other symbols, closeness of fit to a general template, etc.) as would be understood by those having ordinary skill in the art upon reading the present descriptions.
[00271] In some approaches, generating the first represe tation may include one or more alternative and/or additional suboperations, such as dividing the digital image into a plurality of sections. The digital image may be divided into a plurality of sections in any suitable manner, and. in one embodiment the digital image is divided into a plurality of rectangular sections. Of course, sections may be characterized by any shape, and in alternative approaches the plurality of sections may or may not represent the entire digital image, may represent an oversampling of some regions of the image, or may represent a single sampling of each pixel depicted in the
digital image, in a preferred embodiment, as discussed above regarding FIGS. 3A-3C, the digital image is divided into S substantially square sections 304 to form an s x s grid 302.
[00272] In further approaches, generating the first represe tation may also include determining, for each section of the digital image, at least one characteristic value, where each characteristic value corresponds to one or more features descriptive of the section. Within the scope of the present disclosures, any feature that may be expressed as a numerical value is suitable for use in generating the first representation, e.g. an average brightness or intensity (0- 255) across each pixel in the section, an average value (0-255) of each color channel of each pixel in the section, such as an average red-channel value, and average green-channel value, and an average blue-channel value for a red-green-blue (RGB) image, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00273] With continuing reference to FIGS. 3A-3C, in some embodiments each pixel 312 of the first representation 310 corresponds to one of the S sections 304 not only with respect to positional correspondence, but also with respect to feature correspondence. For example, in one approach generating the first representation 310 may additionally include determining a characteristic section intensity value is by calculating the average of the individual intensity values ip of each pixel 306 in the section 304. Then, each pixel 312 in the first representation 310 is assigned an intensity value equal to the average intensity value is calculated for the corresponding section 304 of the digital image 300. In this manner, the first representation 310 reflects a less granular, normalized representation of the features depicted in digital image 300.
[00274] Of course, the pixels 312 comprising the first representation 310 may be represented using any characteristic value or combination of characteristic values without departing from the scope of the presently disclosed classification methods. Further, characteristic values may be computed and/or determined using any suitable means, such as by random selection of a characteristic value from a distribution of values, by a statistical means or measure, such as an average value, a spread of values, a minimum value, a maximum value, a standard deviation of values, a variance of values, or by any other means that would be known to a skilled artisan upon reading the instant descriptions.
[00275] In operation 506, a first feature vector is generated based on the first representation.
[00276] The first feature vector and/ or reference feature matrices may include a plurality of feature vectors, where each feature vector corresponds to a characteristic of a corresponding object class, e.g. a characteristic minimum, maximum, average, etc. brightness in one or more color channels at a particular location (pixel or section), presence of a particular symbol or other reference object at a particular location, dimensions, aspect ratio, pixel density (especially black
pixel density, but also pixel density of any other color channel), etc.
[00277] As would be understood by one having ordinary skill in the art upon reading the present descriptions, feature vectors suitable for inclusion in first feature vector and/or reference feature matrices comprise any type, number and/or length of feature vectors, descriptive of one or more features of the image, e.g. distribution of color data, .
[00278] In operation 508, the first feature vector is compared to a plurality of reference feature matrices, each reference feature matrix comprising a plurality of vectors.
[00279] The comparing operation 508 may be performed according to any suitable matrix comparison, vector comparison, or a combination of the two.
[00280] Thus, in such approaches the comparing may include an N-dimensional feature space comparison. In at least one approach, N is greater than 50, but of course, N may be any value sufficiently large to ensure robust classification of objects into a single, correct object class, which those having ordinary skill in the art reading the present descriptions will appreciate to vary according to many factors, such as the complexity of the object, the similarity or distinctness between object classes, the number of object classes, etc.
[00281] As understood herein, ''objects'' include any tangible thing represented in an image and. which may be described according to at least one unique characteristic such as color, size, dimensions, shape, textare, or representative feature(s) as would be understood by one having ordinary skill in the art upon reading the present descriptions. Additionally, objects include or classified according to at least one unique combination of such characteristics. For example, in various embodiments objects may include but are in no way limited to persons, animals, vehicles, buildings, landmarks, documents, furniture, plants, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00282] For example, in one embodiment where attempting to classify an object depicted in a digital image as one of only a small number of object classes (e.g. 3-5 object classes), each object class being characterized by a significant number of starkly distinguishing features or feature vectors (e.g. each object class corresponding to an object or object(s) characterized by very different size, shape, color profile and/or color scheme and easily distinguishable reference symbols positioned in unique locations on each object class, etc.), a relatively lo value of N may be sufficiently large to ensure robust classification.
[00283] On the other hand, where attempting to classify an object depicted in a digital image as one of a large number of object classes (e.g. 30 or more object classes), and each object class is characterized by a significant number of similar features or feature vectors, and only a few distinguishing features or feature vectors, a relatively high value of may be preferable to
ensure robust classification. Similarly, the value of N is preferably chosen or determined such that the classification is not only robust, but also comp tationally efficient; i.e. the classification process(es) introduce only minimal processing overhead to the device(s) or system(s) utilized to perform the classification algorithm.
00284] The value of N that achieves the desired balance between classification robustness and processing overhead will depend on many factors such as described above and others that would be appreciated by one having ordinary skill in the art upon reading the present descriptions. Moreover, determining the appropriate value of N to achieve the desired balance may be accomplished using any known method or equivalent thereof as understood by a skilled artisan upon reading the instant disclosures.
[00285] In a concrete implementation, directed to classifying driver licenses according to state and distinguishing driver licenses from myriad other document types, it was determined that a 625-dimensional comparison (N = 625) provided a preferably robust classification without introducing unsatisfactorily high overhead to processing performed using a variety of current- generation mobile devices.
[00286] In operation 510, an object depicted in the digital image is classified as a member of a particular object class based at least in part on the comparing operation 508. More specifically, the comparing operation 508 may involve evaluating each feature vector of each reference feature matrix, or alternatively evaluating a plurality of feature matrices for objects belonging to a particular object class, and identifying a hyper-plane in the N-dimensional feature space that separates the fea ture vectors of one reference feature matrix from the feature vectors of other reference feature matrices. In this manner, the classification algorithm defines concrete hyper- plane boundaries between object classes, and may assign an unknown object to a particular object class based on similarity of feature vectors to the particular object class and/or dissimilarity to other reference feature matrix profiles.
[00287] In the simplest example of such feature-space discrimination, imagining a two- dimensional feature space with one feature plotted along the ordinate axis and another feature plotted along the abscissa, objects belonging to one particular class may be characterized by feature vectors having a distribution of values clustered in the lower-right portion of the feature space, while another class of objects may be characterized by feature vectors exhibiting a distribution of values clustered in the upper-left portion of the feature space, and the
classification algorithm may distinguish between the two by identifying a line between each cluster separating the feature space into two classes - "upper left" and "lower-right." Of course, as the number of dimensions considered in the feature space increases, the complexity of the
classification grows rapidly, but also provides significant improvements to classification robustness, as will be appreciated by one having ordinary skill in the art upon reading the present descriptions.
100288] Additional Processing
00289] In some approaches, classification according to embodiments of the presently disclosed methods may include one or more additional and/or alternative features and/or operations, such as described below.
[00290] In one embodiment, classification such as described above may additionally and/or alternatively include assigning a confidence value to a plurality of putative object classes based on the comparing operation (e.g. as performed in operation 508 of method 500) the presently disclosed classification methods, systems and/or computer program products may additionally and/or alternatively determine a location of the mobile device, receive location information indicating the location of the mobile device, etc. and based on the determined location, a confidence value of a classification result corresponding to a particular location may be adjusted. For example, if a mobile device is determined to be located in a particular state (e.g. Maryland) based on a GPS signal, then during classification, a confidence value may be adjusted for any object class corresponding to the particular state (e.g. Maryland Driver License, Maryland Department of Motor Vehicle Title/Registration Form, Maryland Traffic Violation Ticket, etc. as would be understood by one having ordinary skill in the art upon reading the present
descriptions).
[00291] Confidence values may be adjusted in any suitable manner, such as increasing a confidence value for any object class corresponding to a particular location, decreasing a confidence value for any object class not corresponding to a particular location, normalizing confidence value(s) based on correspondence/non-correspondence to a particular location, etc. as would be understood by the skilled, artisan reading the present disclosures.
00292] The mobile device location may be determined using any known method, and employing hardware components of the mobile device or any other number of devices in communication with the mobile device, such as one or more satellites, wireless communication networks, servers, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00293] For example, the mobile device location may be determined based in whole or in part on one or more of a global-positioning system (GPS) signal, a connection to a wireless communication network, a database of known locations (e.g. a contact database, a database associated with a navigational tool such as Google Maps, etc.), a social media tool (e.g. a "check-
in" feature such as provided via Facebook, Google Plus, Yelp, etc.), an IP address, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00294] In more embodiments, classification additionally and/or alternatively includes outputting an indication of the particular object class to a display of the mobile device; and receiving user input via the display of the mobile device in response to outputting the indication. While the user input may be of any known type and relate to any of the herein described features and/or operations, preferably user input relates to confirming, negating or modifying the particular object class to which the object was assigned by the classification algorithm.
[00295] The indication may be output to the display in any suitable manner, such as via a push notification, text message, display window on the display of the mobile device, email, etc, as would be understood by one having ordinary skill in the art. Moreover, the user input may take any form and be received in any known manner, such as detecting a user tapping or pressing on a portion of the mobile device display (e.g. by detecting changes in resistance, capacitance on a touch- screen device, by detecting user interaction with one or more buttons or switches of the mobile device, etc.)
[00296] In one embodiment, classification further includes determining one or more object features of a classified object based at least in part on the particular object class. Thus, classification may include determining such object features using any suitable mechanism or approach, such as receiving an object class identification code and using the object class identification code as a query and/or to perform a lookup in a database of object features organized according to object class and keyed, hashed, indexed, etc. to the object class identification code.
[00297] Object features within the scope of the present disclosures may include any feature capable of being recognized in a digital image, and preferably any feature capable of being expressed in a numerical format (whether scalar, vector, or otherwise), e.g. location of subregion containing reference object(s) (especially in one or more object orientation states, such as landscape, portrait, etc.) object color profile, or color scheme, object subregion color profile or color scheme, location of text, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00298] In accordance with another inventive embodiment commensurate in scope with the present disclosures, FIG. 6, a method 600 is shown. The method 600 may be carried out in any desired environment, and may include embodiments and/or approaches described in relation to FIGS. 1-4D, among others. Of course, more or less operations than those shown in FIG. 6 may
be performed in accordance method 600 as would be appreciated by one of ordinary skill in the art upon reading the present descriptions,
[00299] In operation 602, a first feature vector is generated, based on a digital image captured by a mobile device,
[00300] In operation 604, the first feature vector is compared to a plurality of reference feature matrices.
[00301] In operation 606, an object depicted in the digital image is classified as a member of a particular object class based at least in part on the comparing (e.g. the comparing performed in operation 604),
[00302] In operation 608, one or more object features of the object are determined based at least in part on the particular object class.
[00303] In operation 610, a processing operation is performed. The processing operation includes performmg one or more of the following subprocesses: detecting the object depicted in the digital image based at least in part on the one or more object features; rectangularizing the object depicted in the digital image based at least in part on the one or more object features; cropping the digital image based at least in part on the one or more object features; and bmarizing the digital image based at least in part on the one or more object features.
[00304] As will be further appreciated by one having ordinary skill in the art upon reading the above descriptions of document classification, in various embodiments it may be advantageous to perform one or more additional processing operations, such as the subprocesses described above with reference to operation 610, on a digital image based at least in part on object features determined via document classification.
[00305] For example, after classifying an object depicted in a digital image, such as a document, it may be possible to refine other processing parameters, functions, etc. a d/or utilize information known to be true for the class of objects to which the classified object belongs, such as object shape, size, dimensions, location of regions of interest on and/or in the object, such as regions depicting one or more symbols, patterns, text, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions.
[00306] Regarding performing page detection based on classification, it may be advantageous in some approaches to utilize information known about an object belonging to a particular object class in order to improve object detection capabilities. For example, and as would be appreciated by one having ordinary skill in the art. it may be less computationally expensive, and/or may result in a higher-confidence or higher-quality result to narrow a set of characteristics that may potentially identify an object in a digital image to one or a few discrete, known characteristics,
and simply searcli for those characteristic(s).
[00307] Exemplary characteristics that may be utilized to improve object detection may include characteristics such as object dimensions, object shape, object color, one or more reference features of the object class (such as reference symbols positioned in a known location of a document).
[00308] In another approach, object detection may be improved based on the one or more known characteristics by facilitating an object detection algorithm distinguishing regions of the digital image depicting the object from regions of the digital image depicting other objects, image background, artifacts, etc. as would be understood by one having ordinary skill in the art upon reading the present descriptions. For example, if objects belonging to a particular object class are known to exhibit a particular color profile or scheme, it may be simpler and/or more reliable to attempt detecting the particular color profile or scheme within the digital image rather than detecting a transition from one color profile or scheme (e.g. a background color profile or scheme) to another color profile or scheme (e.g. the object color profile or scheme), especially if the two colors profiles or schemes are not characterized by sharply contrasting features.
[00309] Regarding performing rectangularization based on classification, it may be advantageous in some approaches to utilize information known about an object belonging to a particular object class in order to improve object rectangularization capabilities. For example, and as would be appreciated by one having ordinary skill in the art, it may be less
computationally expensive, a d/or may result in a higher-confidence or higher-quality result to transform a digital representation of an object from a native appearance to a true configuration based on a set of known object characteristics that definitively represent the true object configuration, rather than attempting to estimate the true object configuration from the native appearance and. project the native appearance onto an estimated object configuration,
[00310] In one approach, the classification may identify known dimensions of the object, and based on these known dimensions the digital image may be rectangularized to transform a distorted represe tation of the object in the digital image into an undistorted representation (e.g. by removing projective effects introduced in the process of capturing the image using a camera of a mobile device rather than a traditional fiat-bed scanner, paper- feed scan er or other similar multifunction peripheral (MFP)),
[00311] Regarding performing cropping based on classification, and. similar to the principles discussed, above regarding rectangularization, it may be advantageous in some approaches to utilize information known about an object belonging to a particular object class to improve cropping of digital images depicting the object such that all or significantly all of the cropped
image depicts the object and not image background (or other objects, artifacts, etc. depicted in the image).
[00312] As a simple example, it may be advantageous to determine an object's known size, dimensions, configuration, etc. according to the object classification and utilize this information to identify a region of the image depicting the object from regions of the image not depicting the object, and define crop lines surrounding the object to remove the regions of the image not depicting the object.
[00313] Regarding performing binarization based on classification, the presently disclosed classification algorithms provide several useful improvements to mobile image processing. Several exemplar}' embodiments of such improvements will now be described with reference to FIGS. 4A-4D.
[00314] For example, binarization algorithms generally transform a multi-tonal digital image (e.g. grayscale, color, or any other image such as image 400 exhibiting more than two tones) into a bitonal image, i.e. an image exhibiting only two tones (typically white and black). Those having ordinary skill in the art will appreciate that attempting to binarize a digital image depicting an object with regions exhibiting two or more distinct color profiles and/or color schemes (e.g. a region depicting a color photograph 402 as compared to a region depicting a black/white text region 404, a color-text region 406, a symbol 408 such as a reference object, watermark, etc. object background region 410, etc.) may produce an unsuccessful or
unsatisfactory result.
[00315] As one explanation, these difficulties may be at feast partially due to the differences between the color profiles, schemes, etc., which counter-influence a single binarization transform. Thus, providing an ability to distinguish each of these regions having disparate color schemes or profiles and define separate binarization parameters for each may greatly improve the quality of the resulting bitonal image as a whole and with particular respect to the quality of the transformation in each respective region.
[00316] According to one exemplar}' embodiment shown in FIGS. 4A-4B, improved binarization may include determining an object class color profile and/or scheme (e.g.
determining a color profile and/or color scheme for object background region 410); adjusting one or more binarization parameters based on the object class color profile and/or color scheme; and thresholding the digital image using the one or more adjusted binarization parameters.
[00317] Binarization parameters may include any parameter of any suitable binarization process as would be appreciated by those having ordinary skill in the art reading the present descriptions, and binarization parameters may be adjusted according to any suitable
methodology. For example, with respect to adjusting binarization parameters based on an object class color profile a d/or color scheme, binarization parameters may be adjusted to over- and'or under-emphasize a contribution of one or more color channels, intensities, etc. in accordance with the object class color profile/scheme (such as under-emphasizing the red channel for an object class color profile/scheme relatively saturated by red hue(s), etc.).
[00318] Similarly, in other embodiments such as particularly shown in FIGS. 4B-4D, improved binarization may include determining an object class mask, applying the object class mask to the digital image and thresholding a subregion of the digital image based on the object class mask. The object class mask may be any type of mask, with the condition that the object class mask provides information regarding the location of particular regions of interest characteristic to objects belonging to the class (such as a region depicting a color photograph 402, a region depicting a black/white text region 404, a color-text region 406, a symbol region depicting a symbol 408 such as a reference object, watermark, etc., an object background region 410, etc.) and. enabling the selective inclusion and/or exclusion of such regions from the binarization operation(s).
[0031 ] For example, FIG. 4B, improved binarization includes determining an object class mask 420 identifying regions such as discussed immediately above and applying the object class mask 420 to exclude from binarization all of the digital image 4Θ0 except a single region of interest, such as object background region 410. Alternatively the entire digital image may be masked-out and a region of interest such as object background region 410 subsequently rnasked- in to the binarization process. Moreover, in either event the masking functionality now described with reference to FIG. 4B may be combined with the exemplary color profile and'or color scheme information functionality described above, for example by obtaining both the object class mask and the object color profile and'or color scheme, applying the object class mask to exclude all of the digital image from binarization except object background region 410, adjusting one or more binarization parameters based on the object background region color profile and/or color scheme, and thresholding the object background region 410 using the adjusted binarization parameters.
[00320] Extending the principle shown in FIG. 4B, multiple regions of interest may be masked-in and/or masked-out using object class mask 420 to selectively designate regions and/or parameters for binarization in a layered approach designed to produce high-quality bitonal images. For example, FIG. 4C multiple text regions 404, 406 may be retained for binarization (potentially using adjusted parameters) after applying object class mask 420, for example to exclude all non-text regions from binarization, in some approaches.
[00321 ] Similarly, it may be advantageous to simply exclude only a portion of an image from binarization, whether or not adjusting any parameters. For example, with reference to FIG. 4D, it may be desirable to mask-out a unique region of a digital image 400, such as a region depicting a color photograph 402, using an object class mask 420. Then, particularly in approaches where the remaining portion of the digital image 400 is characterized by a single color profile and/or color scheme, or a small number (i.e. no more than 3) substantially similar color profile and/or color schemes, binarization may be performed, to clarify the remaining portions of the digital image 400. Subsequently, the masked-out unique region may optionally be restored to the digital image 400, with the result being an improved bitonal image quality in all regions of the digital image 400 that were subjected, to binarization coupled with an undisturbed, color photograph 402 in the region of the image not subjected to binarization.
00322] In still more embodiments, it may be advantageous to perform optical character recognition (OCR) based at least in part on the classification and/or result of classification. Specifically, it may be advantageous to determine information about the location, format, and'or content of text depicted in objects belonging to a particular class, and modify predictions estimated by traditional OCR methods based, on an expected text location, format and/or content. For example, in one embodiment where an OCR prediction estimates text in a region
corresponding to a "date" field, of a document reads "Jan, 14, 2013" the presently disclosed algorithms may determine the expected format for this text follows a format such as
"[Abbreviated Month][.] [##][,][####]" the algorithm may correct the erroneous OCR predictions, e.g. converting the comma after "Jan" into a period and/or converting the letter "1" at the end of 2011" into a numerical one character. Similarly, the presently disclosed algorithms may determine the expected format for the same text is instead "[##]/[##]/[####]" and convert "Jan" to "01 " and convert each set of comma-space characters ", " into a slash "/" to correct the erroneous OCR predictions.
00323] A method includes: receiving a digital image captured by a mobile device; and using a processor of the mobile device: generating a first representation of the digital image, the first representation being characterized by a reduced resolution; generating a first feature vector based on the first representation; comparing the first feature vector to a plurality of reference feature matrices; and classifying an object depicted in the digital image as a member of a particular object class based at least in part on the comparing. Generating the first representation involves dividing the digital image into a plurality of sections; and determining, for each section, at least one characteristic value, each characteristic value corresponding to one or more features descriptive of the section. The first representation comprises a plurality of pixels, each of the
plurality of pixels corresponds to one section of the plurality of sections, and each of the pluralit of pixels is characterized by the at least one characteristic value determined for the corresponding section. The digital image comprises a cropped, color image. One or more of the reference feature matrices comprises a plurality of feature vectors, and each feature vector corresponds to at least one characteristic of an object. The comparing comprises an N-dimensional comparison, and N is greater than 50. The first feature vector is characterized by a feature vector length greater than 500. The method also includes determining one or more object features of the object based at least in part on the particular object class; detecting the object depicted in the digital image based at least in part on the classifying and/or result thereof; rectangularizing the object depicted in the digital image based at least in part on the classifying and/or result thereof;
cropping the digital image based at least in part on the classifying and/or result thereof; and/or binarizing the digital image is based at least in part on the classifying and/or result thereof. The binarizing additionally and/or alternatively includes one or more of: determining an object class mask; applying the object class mask to the digital image; and. thresholding a subregion of the digital image based on the object class mask. The method may include adjusting one or more binarization parameters based on the object class mask; and thresholding the digital image using the one or more adjusted binarization parameters, determining an object class color scheme. Similarly, binarizing may include adjusting one or more binarization parameters based on the object class color scheme; and thresholding the digital image using the one or more adjusted binarization parameters. The method additionally and/or alternatively includes: determining a geographical location associated with the mobile device, wherein the classifying is further based at least in part on the geographical location. The method additionally and/or alternatively includes: outputting an indication of the particular object class to a display of the mobile device; and. receiving user input via the display of the mobile device in response to outputting the indication. The method additionally and/or alternatively includes: determining one or more object features of the object based at least in part on the particular object class.
[00324] A method, includes: generating a first feature vector based on a digital image captured by a mobile device; comparing the first feature vector to a plurality of reference feature matrices; classifying an object depicted in the digital image as a member of a particular object class based at least in part on the comparing; and determining one or more object features of the object based at least in part on the particular object class. The method also includes performing at feast one processing operation using a processor of a mobile device, the at least one processing operation selected from a group consisting of: detecting the object depicted in the digital image based at least in part on the one or more object features; rectangularizing the object depicted in the digital
image based at least in part on the one or more object features; cropping the digital image based at least in part on the one or more object features; and binarizing the digital image based at least in part on the one or more object features. The one or more object features comprise an object color scheme, and the binarizing comprises: determining the object color scheme; adjusting one or more binarization parameters based on the processing; and thresholding the digital image using the one or more adjusted binarizat on parameters. The one or more object features may additionally and/or alternatively comprise an object class mask, and the binarizing comprises; determining the object class mask; applying the object class mask to the digital image; and thresholding a subregion of the digital image based on the object class mask.
[00325] Of course, other methods of improving upon a d/or correcting OCR predictions that would be appreciated by the skilled, artisan upon reading these descriptions are also fully within the scope of the present disclosure.
[00326] The inventive concepts disclosed herein have been presented by way of example to illustrate the myriad, features thereof in a plurality of illustrative scenarios, embodiments, and/or implementations. It should be appreciated that the concepts generally disclosed are to be considered as modular, and may be implemented in any combination, permutation, or synthesis thereof. In addition, any modification, alteration, or equivalent of the presently disclosed features, functions, and concepts that would be appreciated by a person having ordinary skill in the art upon reading the instant descriptions should also be considered within the scope of this disclosure.
[00327] Accordingly, one embodiment of the present invention includes all of the features disclosed herein, including those shown and described in conjunction with any of the FIGS. Other embodiments include subsets of the features disclosed herein a d/or shown and. described in conjunction with any of the FIGS. Such features, or subsets thereof, may be combined in any¬ way using known techniques that would become apparent to one skilled in the art after reading the present description.
[00328] While various embodiments have been described above, it should, be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of an embodiment of the present in vention should not be limited by any of the above- described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.