WO2002091288A1 - System and method for compressing stroke-based handwriting and line drawing - Google Patents

System and method for compressing stroke-based handwriting and line drawing Download PDF

Info

Publication number
WO2002091288A1
WO2002091288A1 PCT/SG2001/000088 SG0100088W WO02091288A1 WO 2002091288 A1 WO2002091288 A1 WO 2002091288A1 SG 0100088 W SG0100088 W SG 0100088W WO 02091288 A1 WO02091288 A1 WO 02091288A1
Authority
WO
WIPO (PCT)
Prior art keywords
grid
coordinates
lines
stroke
predefined
Prior art date
Application number
PCT/SG2001/000088
Other languages
French (fr)
Inventor
Qing Chen
Original Assignee
Bijitec Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bijitec Pte Ltd filed Critical Bijitec Pte Ltd
Priority to PCT/SG2001/000088 priority Critical patent/WO2002091288A1/en
Publication of WO2002091288A1 publication Critical patent/WO2002091288A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/20Contour coding, e.g. using detection of edges
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/142Image acquisition using hand-held instruments; Constructional details of the instruments
    • G06V30/1423Image acquisition using hand-held instruments; Constructional details of the instruments the instrument generating sequences of position coordinates corresponding to handwriting

Definitions

  • This invention relates to a system and method for compressing stroke-based handwriting and line drawing.
  • the invention has particular, although not exclusive, utility in relation to compressing stroke-based handwriting of Chinese characters, signs and line drawings.
  • PDA personal digital assistant
  • Chinese characters still presents challenges to handwriting input technology, such as achieving satisfactory recognition ratio.
  • the input of Chinese characters remains a bottleneck on many computers and consumer devices such as mobile phones and PDA devices.
  • Handwriting technologies include handwriting recognition, sketch board technology and ink technology.
  • Handwriting recognition is the analysis of a handwritten character to determine the particular character represented by the written image. Users hand-writes a character on an input device, which is commonly either a touch-sensitive screen or a handwriting tablet device. Recognition software then analyses the handwritten character in accordance with a suitable algorithm to determine which character was written. The recognition software then stores a character code, such as GB, ASCII or Unicode, representing that character. Handwriting recognition technology is already replacing keyboard in some applications. Some recognition software can achieve 90% recognition accuracy, however to accuracy needs to increase before handwriting recognition technology will become wide-spread.
  • Sketch board technology is used in most PDA devices. Users draw an image representing text or simple graphics on the PDA device's screen. Software within the PDA converts to drawn image into graphics data that the user can store for viewing at a later time. No attempt is made at recognizing the drawn image. Sketch board technology is fast and convenient, however the quality of the image is generally low.
  • One example of a sketch board product is Shang Wu Tong from Heng Ji Wei Yie.
  • Ink technology treats a hand drawn image as a series of strokes and then compresses the information defining the strokes. Ink technology does not treat handwritten text as a one-dimensional stream of character; rather it treats the handwritten text as a two-dimensional electronic paper on which strokes are made. Ink technology allows relatively high quality display and printing of characters, however it is not very suitable for text editing. Further, most ink technology products are not optimized for Chinese characters. Examples of ink technology products include Microsoft Richlnk, Marathon PenScript, New Co StrokeMap and CIC QuickNote Pro.
  • the major compression techniques used in handwriting technologies are bitmap compression and stroke compression.
  • Bitmap compression treats the handwritten character as a bitmap graphic, which is then compressed using, for example, run-length encoding (RLE). To maintain the quality of the handwriting, a high-resolution bitmap must be used. Even with compression, the use of a high-resolution bitmap results in a significant amount of information required for each character.
  • RLE run-length encoding
  • Stroke compression treats the handwritten character as a sequence of strokes.
  • the strokes are defined by coordinates on an evenly-spaced grid, typically by a start coordinate, a stop coordinate and possibly intermediate coordinates.
  • the strokes in the drawn image are quantised to the grid.
  • Current stroke compression results in good quality characters, but the stroke description data tends to be large.
  • a system for compressing stroke-based handwritten characters comprising:
  • vectorising means arranged to represent each stroke as at least one line defined by coordinates
  • grid determination means arranged to analyse the coordinates and determine a grid for storing the coordinates, the grid having a plurality of grid lines along each axis, the grid determination means arranged to select the spacing of the grid lines along at least one axis based on the analysis of the coordinates such that the grid lines are not evenly spaced along the one axis;
  • output means arranged to produce data representing said grid and said lines, wherein the coordinates of each line is represented in the data by corresponding points on the grid.
  • the lines include straight lines defined by start and end coordinates.
  • the lines include curved lines defined by start coordinate, at least one intermediate coordinate and an end coordinate.
  • the grid determination means is arranged to analyse the coordinates to determine the coordinate density distribution.
  • the grid determination means is further arranged to parametrically describe the grid line spacing of the grid.
  • At least one predefined grid having predefined grid line spacing is provided, said grid determination means being arranged to compare the analysis of the coordinates with the predefined grid line spacing of each predefined grid and to determine therefrom whether to use that predefined grid.
  • the grid determination means is arranged to define at least one sub- grid of the grid associated with a predefined grid, said output means being arrange to compare the coordinates defining the lines of each stroke with each sub-grid to determine whether each stroke is contained within the sub-grid, and if the coordinates of the lines of a stroke are contained within a sub-grid, to represent the coordinates of the lines of the stroke in the data by corresponding points on the corresponding predefined grid.
  • the system further comprises an input device arranged to provide a digitised representation of the handwritten character or line drawing to said vectorising means.
  • the system further comprises decompression means arranged to receive data representing a handwritten character or line drawing, reconstruct the grid represented in the data according to a desired resolution, and to draw the lines of each stroke on the grid or a sub-grid, according to the points defining each line in the data.
  • decompression means arranged to receive data representing a handwritten character or line drawing, reconstruct the grid represented in the data according to a desired resolution, and to draw the lines of each stroke on the grid or a sub-grid, according to the points defining each line in the data.
  • analysing the coordinates determining from the analysis a grid for storing the coordinates, the grid having a plurality of grid lines along each axis;
  • the lines include straight lines defined by start and end coordinates.
  • the lines include curved lines defined by start coordinate, at least one intermediate coordinate and an end coordinate.
  • the step of analysing the coordinates comprises determining the coordinate density distribution.
  • the method further comprises the step of parametrically describing the grid line spacing of the grid.
  • the method further comprises the steps of:
  • the method further comprises the steps of:
  • the lines include straight lines defined by start and end coordinates.
  • the lines include curved lines defined by start coordinate, at least one intermediate coordinate and an end coordinate.
  • the step of analysing the coordinates comprises determining the coordinate density distribution.
  • the method steps further comprise parametrically describing the grid line spacing of the grid.
  • the method steps further comprise:
  • the method steps further comprise:
  • Figure 1A shows a handwritten character with an uneven grid overlayed on it in accordance with the embodiment of the invention
  • Figure 1 B shows the density of grid lines along the X-axis of the grid shown in Figure 1A;
  • Figure 1 C shows a reconstructed character from the handwritten character shown in Figure 1A in accordance with the embodiment of the invention
  • Figure 2 shows an example application of the embodiment of the invention applied to a computer
  • Figure 3 shows an example application of the embodiment of the invention applied to a mobile phone.
  • the embodiment is directed towards a system and method for compressing stroke-based handwriting, and to applications for such a system and method.
  • the system and method are implemented using software executing on a processor.
  • the software and processor can take the form of a computer, personal digital assistant device, mobile phone, or any other device where handwriting may be used.
  • the software processes a digital representation of a handwritten character or line drawing, in the form of a sequence of coordinates, to produce compressed character data that can be used to reproduce the handwritten character.
  • the character data can be stored, transmitted to another person over a communications network, and used to produce a printed copy of the handwritten character.
  • Any suitable input device can be used to produce the digital representation of a handwritten character. Examples include a touch screen, a tablet input device, pen-based drawing devices such as PDAs, or a computer mouse. Such devices typically produce a sequence of coordinates in response to the user's actions.
  • the software performs the following steps to produce the character data:
  • Header - contains character definition data such as the grid information data
  • Embedded code an optional field for storing the character's value in a corresponding code such as GB, ASCII, BIG5 or Unicode.
  • the embedded code is not utilised by the embodiment, however it may be desirable to provide space for the information for use by other applications, such as character recognition.
  • Each stroke consists of a sequence of points in the high-resolution digital representation of the handwritten character or the line drawing.
  • the sequence of points of each stroke is analysed and replaced with a sequence of straight lines and curved lines.
  • a straight line is characterised by a start point and an end point.
  • a curved line is characterised by a start point, at least one intermediate point and an end point. The points are described using coordinates in the high-resolution image.
  • the start and end of a stroke is detected by tracking a sequence of continuous drawing coordinates sent from the drawing device.
  • pressure information may also be provided from the drawing device which can provide more accurate stroke tracking.
  • the stroke may be transfigured to simplify the data required to describe it, for example by reducing the number of lines and curves required, or by reducing the number of intermediate points describing a curve.
  • This transfiguration operates within predefined parameters to ensure the stroke is not unduly distorted.
  • the parameters may be adjusted according to the desired compression and the accuracy with which it is desired to capture the handwritten character.
  • the predefined parameter is specified as percentage value.
  • the parameter is not a fixed value, but is adjustable in order to achieve different levels of compression ratio and quality.
  • To determine whether a transfiguration would unduly distort the stroke the number of points in the transfigured stroke that coincide with points on the original stroke are counted with a weight that is equivalent to 'off' distance to the nearest point in the line. In addition, the number of points in the transfigured stroke that do not coincide with points on the original stroke are counted. If the ratio of the weighted non-coinciding points to coinciding points exceeds the predetermined percentage, then the transfiguration unduly distorts the stroke.
  • the density of the points in the lines and curves used to describe the strokes in the character or the line drawing is analysed. This analysis is performed separately on each of the X and Y axes to determine the density and distribution of points along each axis.
  • this analysis takes the form of a histogram analysis along each axis using fixed width segments.
  • the histogram stores the number of points falling within each segment along each axis. For example, it divides the X-axis into 32 or 64 evenly-spaced segments and calculates the number of points falling in each segment. The number of segments used in the histogram may be varied depending the target resolution of the handwriting or drawing. The same analysis is performed on the Y-axis. Determine the best grid
  • the density and distribution of points along each axis is used to determine, in combination with the desired compression and resolution, the number of grid lines and the spacing of the grid lines in each axis.
  • the spacing of the grid lines is determined according to the density of the points along an axis.
  • grid lines are closely spaced in areas of high point density, and are more widely spaced in areas of lower point density.
  • uneven grid spacing provides a better quality character or drawing than using a fixed grid spacing having the same number of grid lines.
  • the character data provides for superior compression for a given quality compared with the use of fixed grid spacing.
  • the grid spacing is specified in terms of the proportion of each grid line spacing relative to the entire grid width.
  • the resultant grid is easily scalable to whatever resolution is desired.
  • Figure 1A shows an example of a handwritten character 10 drawn on a high- resolution input device.
  • Figure 1A also shows a low-resolution grid 12 with axes X and Y, used to store the character data, overlaid on the handwritten character 10.
  • the grid spacing of the grid 12 is not even. The spacing of grid lines decreases in each axis where more points are positioned, and increases in each axis where less points occur. This provides an area 14 of increased resolution where fine detail occurs in the handwritten character 10, areas 16 of moderate resolution where some strokes occur in the handwritten character 10, and areas 18 of reduced resolution where few or no strokes appear.
  • Figure 1 B is a graphical representation of the grid line density along the X axis of the grid 12 shown in Figure 1A.
  • That predefined grid is selected as the grid for the handwritten character and the identification number of the predefined grid is stored in the header data, avoiding the need to define the grid line spacings and thus reducing the information required to specify the character.
  • the grid spacing information is stored in the header of the character data. This can be achieved by expressly specifying the grid line spacings for each axis.
  • the grid line spacings of each axis can be defined parametrically.
  • a first mathematical function with a first set of parameters is used to describe the grid line spacing along the X axis.
  • a second mathematical function with a second set of parameters is used to describe the grid line spacing along the Y axis.
  • the mathematical function used to reproduce the density histogram is sin (a) * b 2 / c. Values of a, b and c are stored in the header for each axis.
  • the point distribution is further analysed to determine whether sub-grid (s) can be used in parts of the handwritten character 10 to further improve the quality and compression of the character data.
  • the density and distribution of points the handwritten character or line drawing is sometimes very uneven. For example where annotations are made in a drawing, the points defining the annotations will be very dense compared to the remainder of the drawing.
  • a sub-grid refers to one of the set of predefined grids mentioned above, but applied to only the high-density area of the handwritten character or line drawing and not the whole character or line drawing.
  • the sub-grid may have a different resolution (number of grid lines) than the selected grid in the high-density area. Thus, both quality and data size for the strokes falling in the sub-grid area will be improved.
  • the points in each stroke are compared with the area covered by a sub- grid. If the points in a stroke are contained within one of the sub-grids, the identification number of the predefined grid used in that sub-grid, and its position on the original grid character is stored in the stroke data. The predefined grid is then used to store the points of the lines and curves that define the stroke. For each curve or line, the coordinates of each point on the predefined grid are stored in the stroke data. This provides an effective way of providing still further resolution when defining the character without needing to use a higher resolution grid.
  • the selected grid is then used to store the points of the lines and curves that define the stroke. For each curve or line, the coordinates of each point on the selected grid are stored in the stroke data.
  • stroke optimisation is employed to remove redundant strokes and simplify stroke data.
  • the stroke may alternatively be represented as a curve.
  • a curved line having a small curvature may alternatively be represented by a straight line.
  • the character data can be stored or transmitted to another person.
  • the character data is used to reproduce the handwritten character, either to a display or to a printer.
  • the handwritten character is reconstructed from the character data using the following steps.
  • the grid data is obtained from the header data. This data is used to construct a grid of a desired resolution.
  • Figure 1C shows an example of a reconstructed character 20 from character data generated for the handwritten character in Figure 1A. The reconstructed grid 22 is overlaid on the reconstructed character 20.
  • strokes 24 are drawn on the grid 22 from the stroke data, in the order that the are stored in the stroke data.
  • the sub-grid is also constructed in the portion of the grid 22 specified in the stroke data at the desired resolution and used to draw the stroke.
  • each curve is a segment of a circle.
  • Other curve types are possible in other embodiments, such as spline curves.
  • a segment of a circle is the simplest representation for low computing power devices such as mobile phones.
  • the first example is shown in Figure 2, in which the software 30 is provided on a computer or PDA (not shown).
  • the software 30 is implemented as an encoder 32 that acts as an interface between application software 34 and a handwriting device 36.
  • a handwriting device interface 38 is provided between the handwriting device 36 and the encoder 32, which receives signals from the handwriting device 36 and translates them into coordinates.
  • the software 30 also includes a decoder 40 that acts as an interface between the application software 34 and an output device 42 such as a display or printer.
  • each character is separately compressed, the character data can be treated in the same manner as any other character stream.
  • characters can be copied, pasted, cut, moved, and new characters inserted in the application software 34 as desired.
  • character data can be stored in any suitable storage device 44, or transmitted over a communications network 46, as desired.
  • the second example relates to a mobile phone 50 with a touch screen 52.
  • the software is implemented as an encoder 54 and a decoder 56 that are provided between a system storage 58 of the mobile phone 50 and the touch screen 52.
  • a person can use a stylus 60 to hand-write characters in a short messaging system (SMS) function of the mobile phone 50.
  • SMS short messaging system
  • the handwritten characters are compressed by the encoder software 54 in the mobile phone 50 and the compressed character data is stored in the system storage 58.
  • the compressed character data is then transmitted to other user's mobile phones 62 using known SMS protocol 64 and transmission equipment 66.
  • the decoder 56 of the embodiment on the destination mobile phone decodes the character data and displays the characters of the SMS message on the touch screen 52 of the mobile phone 50.
  • the embodiment is well suited for implementation in mobile phones, where the available computing power makes character recognition difficult in terms of accuracy and cost. Further, the limited network bandwidth available for the transmission of SMS messages restricts the direct transmission of graphics.
  • the character data is suitable for transmission over a computer network such as the Internet.
  • a computer network such as the Internet.
  • This enables, for example, handwritten e-mail to be sent and received, handwritten exchanges in server-mediated or peer-to-peer chat applications such as ICQ, as well as server processing of character data.

Abstract

A system and method for compressing stroke-based handwritten characters and line drawings (10), in which each stroke is represented as at least one straight or curved line defined by coordinates, analysing the coordinates and determining a grid (12) for storing the coordinates, the grid (12) having a plurality of grid lines along each axis, selecting the spacing of the grid lines along at least one axis based on the analysis of the coordinates such that the grid lines are not evenly spaced along the one axis, and producing data representing said grid and said lines, wherein the coordinates of each line is represented in the data by corresponding points on the grid (12).

Description

System And Method For Compressing Stroke-Based Handwriting And Line Drawing
FIELD OF THE INVENTION
This invention relates to a system and method for compressing stroke-based handwriting and line drawing. The invention has particular, although not exclusive, utility in relation to compressing stroke-based handwriting of Chinese characters, signs and line drawings.
BACKGROUND ART
Advances in information technology have enabled handwriting input technology, an important technology that is increasingly being used in mobile devices. Hardware and software products that use handwriting input technology are widely available in the market. Hand-drawing technology has appeared in computer software applications for many years. It has also become a useful application on personal digital assistant (PDA) devices.
The complexity of Chinese characters still presents challenges to handwriting input technology, such as achieving satisfactory recognition ratio. The input of Chinese characters remains a bottleneck on many computers and consumer devices such as mobile phones and PDA devices.
Handwriting technologies include handwriting recognition, sketch board technology and ink technology.
Handwriting recognition is the analysis of a handwritten character to determine the particular character represented by the written image. Users hand-writes a character on an input device, which is commonly either a touch-sensitive screen or a handwriting tablet device. Recognition software then analyses the handwritten character in accordance with a suitable algorithm to determine which character was written. The recognition software then stores a character code, such as GB, ASCII or Unicode, representing that character. Handwriting recognition technology is already replacing keyboard in some applications. Some recognition software can achieve 90% recognition accuracy, however to accuracy needs to increase before handwriting recognition technology will become wide-spread.
Sketch board technology is used in most PDA devices. Users draw an image representing text or simple graphics on the PDA device's screen. Software within the PDA converts to drawn image into graphics data that the user can store for viewing at a later time. No attempt is made at recognizing the drawn image. Sketch board technology is fast and convenient, however the quality of the image is generally low. One example of a sketch board product is Shang Wu Tong from Heng Ji Wei Yie.
Ink technology treats a hand drawn image as a series of strokes and then compresses the information defining the strokes. Ink technology does not treat handwritten text as a one-dimensional stream of character; rather it treats the handwritten text as a two-dimensional electronic paper on which strokes are made. Ink technology allows relatively high quality display and printing of characters, however it is not very suitable for text editing. Further, most ink technology products are not optimized for Chinese characters. Examples of ink technology products include Microsoft Richlnk, Marathon PenScript, New Co StrokeMap and CIC QuickNote Pro.
The major compression techniques used in handwriting technologies are bitmap compression and stroke compression.
Bitmap compression treats the handwritten character as a bitmap graphic, which is then compressed using, for example, run-length encoding (RLE). To maintain the quality of the handwriting, a high-resolution bitmap must be used. Even with compression, the use of a high-resolution bitmap results in a significant amount of information required for each character.
Stroke compression treats the handwritten character as a sequence of strokes.
The strokes are defined by coordinates on an evenly-spaced grid, typically by a start coordinate, a stop coordinate and possibly intermediate coordinates. Thus, the strokes in the drawn image are quantised to the grid. Current stroke compression results in good quality characters, but the stroke description data tends to be large.
DISCLOSURE OF THE INVENTION
Throughout the specification, unless the context requires otherwise, the word "comprise" or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
In accordance with a first aspect of this invention, there is provided a system for compressing stroke-based handwritten characters, comprising:
vectorising means arranged to represent each stroke as at least one line defined by coordinates;
grid determination means arranged to analyse the coordinates and determine a grid for storing the coordinates, the grid having a plurality of grid lines along each axis, the grid determination means arranged to select the spacing of the grid lines along at least one axis based on the analysis of the coordinates such that the grid lines are not evenly spaced along the one axis; and
output means arranged to produce data representing said grid and said lines, wherein the coordinates of each line is represented in the data by corresponding points on the grid.
Preferably, the lines include straight lines defined by start and end coordinates.
Preferably, the lines include curved lines defined by start coordinate, at least one intermediate coordinate and an end coordinate.
Preferably, the grid determination means is arranged to analyse the coordinates to determine the coordinate density distribution. Preferably, the grid determination means is further arranged to parametrically describe the grid line spacing of the grid.
Preferably, at least one predefined grid having predefined grid line spacing is provided, said grid determination means being arranged to compare the analysis of the coordinates with the predefined grid line spacing of each predefined grid and to determine therefrom whether to use that predefined grid.
Preferably, the grid determination means is arranged to define at least one sub- grid of the grid associated with a predefined grid, said output means being arrange to compare the coordinates defining the lines of each stroke with each sub-grid to determine whether each stroke is contained within the sub-grid, and if the coordinates of the lines of a stroke are contained within a sub-grid, to represent the coordinates of the lines of the stroke in the data by corresponding points on the corresponding predefined grid.
Preferably, the system further comprises an input device arranged to provide a digitised representation of the handwritten character or line drawing to said vectorising means.
Preferably, the system further comprises decompression means arranged to receive data representing a handwritten character or line drawing, reconstruct the grid represented in the data according to a desired resolution, and to draw the lines of each stroke on the grid or a sub-grid, according to the points defining each line in the data.
In accordance with a second aspect of this invention, there is provided a method for compressing stroke-based handwritten characters and line drawings, comprising the steps of:
representing each stroke as at least one line defined by coordinates;
analysing the coordinates; determining from the analysis a grid for storing the coordinates, the grid having a plurality of grid lines along each axis;
determining the spacing of the grid lines along at least one axis based on the analysis of the coordinates such that the grid lines are not evenly spaced along the one axis; and
representing the coordinates of each line by corresponding points on the grid.
Preferably, the lines include straight lines defined by start and end coordinates.
Preferably, the lines include curved lines defined by start coordinate, at least one intermediate coordinate and an end coordinate.
Preferably, the step of analysing the coordinates comprises determining the coordinate density distribution.
Preferably, the method further comprises the step of parametrically describing the grid line spacing of the grid.
Preferably, the method further comprises the steps of:
providing at least one predefined grid having predefined grid line spacing, and
comparing the analysis of the coordinates with the predefined grid line spacing of each predefined grid and to determine therefrom whether to use that predefined grid.
Preferably, the method further comprises the steps of:
defining at least one sub-grid of the grid associated with a predefined grid; and comparing the coordinates defining the lines of each stroke with each sub- grid to determine whether each stroke is contained within the sub-grid, and if the coordinates of the lines of a stroke are contained within a sub-grid, to represent the coordinates of the lines of the stroke in the data by corresponding points on the corresponding predefined grid.
In accordance with a third aspect of this invention, there is provided computer software provided on a computer-readable media, which, when executed on a computer, execute method steps for compressing stroke-based handwritten characters and line drawings, the steps comprising:
representing each stroke as at least one line defined by coordinates;
analysing the coordinates;
determining from the analysis a grid for storing the coordinates, the grid having a plurality of grid lines along each axis;
determining the spacing of the grid lines along at least one axis based on the analysis of the coordinates such that the grid lines are not evenly spaced along the one axis; and
representing the coordinates of each line by corresponding points on the grid.
Preferably, the lines include straight lines defined by start and end coordinates.
Preferably, the lines include curved lines defined by start coordinate, at least one intermediate coordinate and an end coordinate.
Preferably, the step of analysing the coordinates comprises determining the coordinate density distribution.
Preferably, the method steps further comprise parametrically describing the grid line spacing of the grid. Preferably, the method steps further comprise:
providing at least one predefined grid having predefined grid line spacing, and
comparing the analysis of the coordinates with the predefined grid line spacing of each predefined grid and to determine therefrom whether to use that predefined grid.
Preferably, the method steps further comprise:
defining at least one sub-grid of the grid associated with a predefined grid; and
comparing the coordinates defining the lines of each stroke with each sub- grid to determine whether each stroke is contained within the sub-grid, and if the coordinates of the lines of a stroke are contained within a sub-grid, to represent the coordinates of the lines of the stroke in the data by corresponding points on the corresponding predefined grid.
BRIEF DESCRIPTION OF THE DRAWINGS
One embodiment of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
Figure 1A shows a handwritten character with an uneven grid overlayed on it in accordance with the embodiment of the invention;
Figure 1 B shows the density of grid lines along the X-axis of the grid shown in Figure 1A;
Figure 1 C shows a reconstructed character from the handwritten character shown in Figure 1A in accordance with the embodiment of the invention;
Figure 2 shows an example application of the embodiment of the invention applied to a computer; and Figure 3 shows an example application of the embodiment of the invention applied to a mobile phone.
BEST MODE(S) FOR CARRYING OUT THE INVENTION
The embodiment is directed towards a system and method for compressing stroke-based handwriting, and to applications for such a system and method.
In the embodiment, the system and method are implemented using software executing on a processor. The software and processor can take the form of a computer, personal digital assistant device, mobile phone, or any other device where handwriting may be used.
The software processes a digital representation of a handwritten character or line drawing, in the form of a sequence of coordinates, to produce compressed character data that can be used to reproduce the handwritten character. The character data can be stored, transmitted to another person over a communications network, and used to produce a printed copy of the handwritten character.
Any suitable input device can be used to produce the digital representation of a handwritten character. Examples include a touch screen, a tablet input device, pen-based drawing devices such as PDAs, or a computer mouse. Such devices typically produce a sequence of coordinates in response to the user's actions.
Briefly, the software performs the following steps to produce the character data:
• Vectorise the strokes in the handwritten character or the line drawing to remove redundant information
• Analyse the stroke density distribution
• Determine the best grid
• Determine if sub-grids are needed • Optimise the strokes by combining two line segments into one within a given margin whenever possible
Each handwritten character is stored as character data in the following format:
• Header - contains character definition data such as the grid information data
• Embedded code - an optional field for storing the character's value in a corresponding code such as GB, ASCII, BIG5 or Unicode. The embedded code is not utilised by the embodiment, however it may be desirable to provide space for the information for use by other applications, such as character recognition.
• Stroke data - data defining the strokes that form the character
Each of the above steps will now be described in detail below:
Vectorise the strokes in the handwritten character or the line drawing to remove redundant information
Each stroke consists of a sequence of points in the high-resolution digital representation of the handwritten character or the line drawing. The sequence of points of each stroke is analysed and replaced with a sequence of straight lines and curved lines. A straight line is characterised by a start point and an end point. A curved line is characterised by a start point, at least one intermediate point and an end point. The points are described using coordinates in the high-resolution image.
The start and end of a stroke is detected by tracking a sequence of continuous drawing coordinates sent from the drawing device. In some applications, pressure information may also be provided from the drawing device which can provide more accurate stroke tracking.
In vectorising each stroke into straight lines and curved lines, the stroke may be transfigured to simplify the data required to describe it, for example by reducing the number of lines and curves required, or by reducing the number of intermediate points describing a curve. This transfiguration operates within predefined parameters to ensure the stroke is not unduly distorted. The parameters may be adjusted according to the desired compression and the accuracy with which it is desired to capture the handwritten character.
In the embodiment, the predefined parameter is specified as percentage value. The parameter is not a fixed value, but is adjustable in order to achieve different levels of compression ratio and quality. To determine whether a transfiguration would unduly distort the stroke, the number of points in the transfigured stroke that coincide with points on the original stroke are counted with a weight that is equivalent to 'off' distance to the nearest point in the line. In addition, the number of points in the transfigured stroke that do not coincide with points on the original stroke are counted. If the ratio of the weighted non-coinciding points to coinciding points exceeds the predetermined percentage, then the transfiguration unduly distorts the stroke.
It should be appreciated that in other embodiments, other methods of assessing the distortion introduced by the transfiguration may be used.
Analyse the stroke density distribution
Next, the density of the points in the lines and curves used to describe the strokes in the character or the line drawing is analysed. This analysis is performed separately on each of the X and Y axes to determine the density and distribution of points along each axis.
In the embodiment, this analysis takes the form of a histogram analysis along each axis using fixed width segments. The histogram stores the number of points falling within each segment along each axis. For example, it divides the X-axis into 32 or 64 evenly-spaced segments and calculates the number of points falling in each segment. The number of segments used in the histogram may be varied depending the target resolution of the handwriting or drawing. The same analysis is performed on the Y-axis. Determine the best grid
The density and distribution of points along each axis is used to determine, in combination with the desired compression and resolution, the number of grid lines and the spacing of the grid lines in each axis.
The spacing of the grid lines is determined according to the density of the points along an axis. Thus, grid lines are closely spaced in areas of high point density, and are more widely spaced in areas of lower point density. The use of uneven grid spacing provides a better quality character or drawing than using a fixed grid spacing having the same number of grid lines. As a result, the character data provides for superior compression for a given quality compared with the use of fixed grid spacing.
The grid spacing is specified in terms of the proportion of each grid line spacing relative to the entire grid width. Thus, the resultant grid is easily scalable to whatever resolution is desired.
Figure 1A shows an example of a handwritten character 10 drawn on a high- resolution input device. Figure 1A also shows a low-resolution grid 12 with axes X and Y, used to store the character data, overlaid on the handwritten character 10. As shown, the grid spacing of the grid 12 is not even. The spacing of grid lines decreases in each axis where more points are positioned, and increases in each axis where less points occur. This provides an area 14 of increased resolution where fine detail occurs in the handwritten character 10, areas 16 of moderate resolution where some strokes occur in the handwritten character 10, and areas 18 of reduced resolution where few or no strokes appear.
Figure 1 B is a graphical representation of the grid line density along the X axis of the grid 12 shown in Figure 1A.
As a result of an analysis of the point distribution and density of Chinese characters, the inventors have also discovered that many Chinese characters have similar point distribution and density. In the embodiment, therefore, a set of predefined grids is provided that correspond to commonly occurring point distribution and density.
If the point distribution and density of the handwritten character is similar to one of the predefined grids, that predefined grid is selected as the grid for the handwritten character and the identification number of the predefined grid is stored in the header data, avoiding the need to define the grid line spacings and thus reducing the information required to specify the character.
If the point density and distribution of the handwritten character does not match any of the predefined grids, the grid spacing information is stored in the header of the character data. This can be achieved by expressly specifying the grid line spacings for each axis.
Alternatively, the grid line spacings of each axis can be defined parametrically. A first mathematical function with a first set of parameters is used to describe the grid line spacing along the X axis. A second mathematical function with a second set of parameters is used to describe the grid line spacing along the Y axis. By fixing the first and second mathematical functions, only the first and second sets of parameters need to be stored in the header information.
In the embodiment, the mathematical function used to reproduce the density histogram is sin (a) * b2 / c. Values of a, b and c are stored in the header for each axis.
Determine if sub-grids are needed
If a custom grid spacing is selected for the handwritten character 10, the point distribution is further analysed to determine whether sub-grid (s) can be used in parts of the handwritten character 10 to further improve the quality and compression of the character data.
The density and distribution of points the handwritten character or line drawing is sometimes very uneven. For example where annotations are made in a drawing, the points defining the annotations will be very dense compared to the remainder of the drawing.
If an area of the handwritten character or line drawing has much higher density than the remaining areas, a sub-grid will be used for this high-density area. A sub-grid refers to one of the set of predefined grids mentioned above, but applied to only the high-density area of the handwritten character or line drawing and not the whole character or line drawing.
The sub-grid may have a different resolution (number of grid lines) than the selected grid in the high-density area. Thus, both quality and data size for the strokes falling in the sub-grid area will be improved.
Next, each stroke is stored in the character data.
Firstly, the points in each stroke are compared with the area covered by a sub- grid. If the points in a stroke are contained within one of the sub-grids, the identification number of the predefined grid used in that sub-grid, and its position on the original grid character is stored in the stroke data. The predefined grid is then used to store the points of the lines and curves that define the stroke. For each curve or line, the coordinates of each point on the predefined grid are stored in the stroke data. This provides an effective way of providing still further resolution when defining the character without needing to use a higher resolution grid.
If the points in a stroke are not contained within any of the sub-grids, the selected grid is then used to store the points of the lines and curves that define the stroke. For each curve or line, the coordinates of each point on the selected grid are stored in the stroke data.
Optimise the strokes
To further compress the character data, stroke optimisation is employed to remove redundant strokes and simplify stroke data. For example, to reduce the information required to describe a stroke that is currently defined as a series of connected lines, the stroke may alternatively be represented as a curve.
Similarly, a curved line having a small curvature may alternatively be represented by a straight line.
Further, if two connected strokes can be simplified by using one stroke without distortion exceeding the predetermined parameter, a single stroke will be used to replace the two strokes. The same technique is used to measure the distortion in this step as was described above in relation to transfiguring the strokes during the vectorising step.
The extent to which stroke optimisation is performed and the range of allowable simplification is determined according to the desired compression and resolution requirements.
Once complete, the character data can be stored or transmitted to another person. The character data is used to reproduce the handwritten character, either to a display or to a printer. The handwritten character is reconstructed from the character data using the following steps.
Firstly, the grid data is obtained from the header data. This data is used to construct a grid of a desired resolution. Figure 1C shows an example of a reconstructed character 20 from character data generated for the handwritten character in Figure 1A. The reconstructed grid 22 is overlaid on the reconstructed character 20.
Next, strokes 24 are drawn on the grid 22 from the stroke data, in the order that the are stored in the stroke data.
If a stroke is defined using a sub-grid, the sub-grid is also constructed in the portion of the grid 22 specified in the stroke data at the desired resolution and used to draw the stroke.
In the embodiment, each curve is a segment of a circle. Other curve types are possible in other embodiments, such as spline curves. However, a segment of a circle is the simplest representation for low computing power devices such as mobile phones.
The following examples demonstrate applications where the above system and method can be used.
The first example is shown in Figure 2, in which the software 30 is provided on a computer or PDA (not shown). In this example, the software 30 is implemented as an encoder 32 that acts as an interface between application software 34 and a handwriting device 36. A handwriting device interface 38 is provided between the handwriting device 36 and the encoder 32, which receives signals from the handwriting device 36 and translates them into coordinates. The software 30 also includes a decoder 40 that acts as an interface between the application software 34 and an output device 42 such as a display or printer.
Since each character is separately compressed, the character data can be treated in the same manner as any other character stream. Thus, characters can be copied, pasted, cut, moved, and new characters inserted in the application software 34 as desired.
Further, the character data can be stored in any suitable storage device 44, or transmitted over a communications network 46, as desired.
The second example relates to a mobile phone 50 with a touch screen 52. In this example, the software is implemented as an encoder 54 and a decoder 56 that are provided between a system storage 58 of the mobile phone 50 and the touch screen 52.
A person can use a stylus 60 to hand-write characters in a short messaging system (SMS) function of the mobile phone 50. The handwritten characters are compressed by the encoder software 54 in the mobile phone 50 and the compressed character data is stored in the system storage 58. The compressed character data is then transmitted to other user's mobile phones 62 using known SMS protocol 64 and transmission equipment 66. The decoder 56 of the embodiment on the destination mobile phone decodes the character data and displays the characters of the SMS message on the touch screen 52 of the mobile phone 50.
The embodiment is well suited for implementation in mobile phones, where the available computing power makes character recognition difficult in terms of accuracy and cost. Further, the limited network bandwidth available for the transmission of SMS messages restricts the direct transmission of graphics.
It should be noted that in the example relating to a computer or a PDA, the character data is suitable for transmission over a computer network such as the Internet. This enables, for example, handwritten e-mail to be sent and received, handwritten exchanges in server-mediated or peer-to-peer chat applications such as ICQ, as well as server processing of character data.
As will be apparent to a person skilled in the art, the above system and method of compressing handwritten stroke-based characters offers several benefits:
• superior resolution at a given compression ratio compared with current stroke- based systems.
• can be used with any language/character set
• easy to use and learn
• low processing overhead compared with character recognition systems
• avoids problems arising from inaccurate recognition
• retains personalised appearance of handwritten characters.
It should be appreciated that the scope of this invention is not limited to the particular embodiments described above.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS
1. A system for compressing stroke-based handwritten characters and line drawings, comprising:
vectorising means arranged to represent each stroke as at least one line defined by coordinates;
grid determination means arranged to analyse the coordinates and determine a grid for storing the coordinates, the grid having a plurality of grid lines along each axis, the grid determination means arranged to select the spacing of the grid lines along at least one axis based on the analysis of the coordinates such that the grid lines are not evenly spaced along the one axis; and
output means arranged to produce data representing said grid and said lines, wherein the coordinates of each line is represented in the data by corresponding points on the grid.
2. The system of claim 1 , wherein the lines include straight lines defined by start and end coordinates.
3. The system of claim 1 or 2, wherein the lines include curved lines defined by start coordinate, at least one intermediate coordinate and an end coordinate.
4. The system of any one of claims 1 to 3, wherein the grid determination means is arranged to analyse the coordinates to determine the coordinate density distribution.
5. The system of any of the preceding claims, wherein the grid determination means is further arranged to parametrically describe the grid line spacing of the grid.
6. The system of any of the preceding claims, wherein at least one predefined grid having predefined grid line spacing is provided, said grid determination means being arranged to compare the analysis of the coordinates with the predefined grid line spacing of each predefined grid and to determine therefrom whether to use that predefined grid.
7. The system of claim 6, wherein the grid determination means is arranged to define at least one sub-grid of the grid associated with a predefined grid, said output means being arrange to compare the coordinates defining the lines of each stroke with each sub-grid to determine whether each stroke is contained within the sub-grid, and if the coordinates of the lines of a stroke are contained within a sub-grid, to represent the coordinates of the lines of the stroke in the data by corresponding points on the corresponding predefined grid.
8. The system of any one of the preceding claims, further comprising an input device arranged to provide a digitised representation of the handwritten character or line drawing to said vectorising means.
9. The system of any one of the preceding claims, further comprising decompression means arranged to receive data representing a handwritten character or line drawing, reconstruct the grid represented in the data according to a desired resolution, and to draw the lines of each stroke on the grid or a sub-grid, according to the points defining each line in the data.
10. A method for compressing stroke-based handwritten characters and line drawings, comprising the steps of:
representing each stroke as at least one line defined by coordinates;
analysing the coordinates;
determining from the analysis a grid for storing the coordinates, the grid having a plurality of grid lines along each axis;
determining the spacing of the grid lines along at least one axis based on the analysis of the coordinates such that the grid lines are not evenly spaced along the one axis; and
representing the coordinates of each line by corresponding points on the grid.
11. The method of claim 10, wherein the lines include straight lines defined by start and end coordinates.
12. The method of claim 10 or 11 , wherein the lines include curved lines defined by start coordinate, at least one intermediate coordinate and an end coordinate.
13. The method of any one of claims 10 to 12, wherein the step of analysing the coordinates comprises determining the coordinate density distribution.
14. The method of any one of claims 10 to 13, further comprising the step of parametrically describing the grid line spacing of the grid.
15. The method of any one of claims 10 to 14, further comprising the steps of:
providing at least one predefined grid having predefined grid line spacing, and
comparing the analysis of the coordinates with the predefined grid line spacing of each predefined grid and to determine therefrom whether to use that predefined grid.
16. The method of claim 15, further comprising the steps of:
defining at least one sub-grid of the grid associated with a predefined grid; and
comparing the coordinates defining the lines of each stroke with each sub-grid to determine whether each stroke is contained within the sub-grid, and if the coordinates of the lines of a stroke are contained within a sub-grid, to represent the coordinates of the lines of the stroke in the data by corresponding points on the corresponding predefined grid.
17. Computer software provided on a computer-readable media, which, when executed on a computer, execute method steps for compressing stroke-based handwritten characters and line drawings, the steps comprising:
representing each stroke as at least one line defined by coordinates; analysing the coordinates;
determining from the analysis a grid for storing the coordinates, the grid having a plurality of grid lines along each axis;
determining the spacing of the grid lines along at least one axis based on the analysis of the coordinates such that the grid lines are not evenly spaced along the one axis; and
representing the coordinates of each line by corresponding points on the grid.
18. The computer software of claim 17, wherein the lines include straight lines defined by start and end coordinates.
19. The computer software of claim 17 or 18, wherein the lines include curved lines defined by start coordinate, at least one intermediate coordinate and an end coordinate.
20. The computer software of any one of claims 17 to 19, wherein the step of analysing the coordinates comprises determining the coordinate density distribution.
21. The computer software of any one of claims 17 to 20, wherein the method steps further comprise the step of parametrically describing the grid line spacing of the grid.
22. The computer software of any one of claims 17 to 21 , wherein the method steps further comprise:
providing at least one predefined grid having predefined grid line spacing, and
comparing the analysis of the coordinates with the predefined grid line spacing of each predefined grid and to determine therefrom whether to use that predefined grid.
3. The computer software of claim 22, wherein the method steps further comprise:
defining at least one sub-grid of the grid associated with a predefined grid; and
comparing the coordinates defining the lines of each stroke with each sub-grid to determine whether each stroke is contained within the sub-grid, and if the coordinates of the lines of a stroke are contained within a sub-grid, to represent the coordinates of the lines of the stroke in the data by corresponding points on the corresponding predefined grid.
PCT/SG2001/000088 2001-05-10 2001-05-10 System and method for compressing stroke-based handwriting and line drawing WO2002091288A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/SG2001/000088 WO2002091288A1 (en) 2001-05-10 2001-05-10 System and method for compressing stroke-based handwriting and line drawing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SG2001/000088 WO2002091288A1 (en) 2001-05-10 2001-05-10 System and method for compressing stroke-based handwriting and line drawing

Publications (1)

Publication Number Publication Date
WO2002091288A1 true WO2002091288A1 (en) 2002-11-14

Family

ID=20428933

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2001/000088 WO2002091288A1 (en) 2001-05-10 2001-05-10 System and method for compressing stroke-based handwriting and line drawing

Country Status (1)

Country Link
WO (1) WO2002091288A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4597101A (en) * 1982-06-30 1986-06-24 Nippon Telegraph & Telephone Public Corp. Method and an apparatus for coding/decoding telewriting signals
JPH08166999A (en) * 1994-12-12 1996-06-25 Ricoh Co Ltd Online handwritten character recognition device
JPH10214312A (en) * 1997-01-29 1998-08-11 Hitachi Ltd Online hand-written character recognition device
JPH1196302A (en) * 1997-09-22 1999-04-09 Sharp Corp Handwritten character recognizing device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4597101A (en) * 1982-06-30 1986-06-24 Nippon Telegraph & Telephone Public Corp. Method and an apparatus for coding/decoding telewriting signals
JPH08166999A (en) * 1994-12-12 1996-06-25 Ricoh Co Ltd Online handwritten character recognition device
JPH10214312A (en) * 1997-01-29 1998-08-11 Hitachi Ltd Online hand-written character recognition device
JPH1196302A (en) * 1997-09-22 1999-04-09 Sharp Corp Handwritten character recognizing device

Similar Documents

Publication Publication Date Title
US6798907B1 (en) System, computer software product and method for transmitting and processing handwritten data
US10496872B2 (en) Dynamic handwriting verification, handwriting-based user authentication, handwriting data generation, and handwriting data preservation
US11422693B2 (en) Digital ink generating apparatus, method and program, and digital ink reproducing apparatus, method and program
US20070237401A1 (en) Converting digital images containing text to token-based files for rendering
KR100484467B1 (en) Compression of digital ink
CN109313794A (en) Chinese printable character image composition method and device
US7970812B2 (en) Redistribution of space between text segments
JPH11338976A (en) Document image recognition device, method therefor, and recording medium
US20050089237A1 (en) Method and apparatus for bezier curve approximation data compression
US20150193387A1 (en) Cloud-based font service system
US7302106B2 (en) System and method for ink or handwriting compression
Tang et al. Modified fractal signature (MFS): A new approach to document analysis for automatic knowledge acquisition
US8195626B1 (en) Compressing token-based files for transfer and reconstruction
JP2023039892A (en) Training method for character generation model, character generating method, device, apparatus, and medium
CN113343958A (en) Text recognition method, device, equipment and medium
US10818050B2 (en) Vector graphic font character generation techniques
KR20050036894A (en) Method of compressing digital ink
CN112840622B (en) Pushing method and related product
JP6387207B1 (en) Digital ink encoding method and decoding method
WO2002091288A1 (en) System and method for compressing stroke-based handwriting and line drawing
US20070196021A1 (en) System and method for creating synthetic ligatures as quality prototypes for sparse multi-character clusters
JP2023039891A (en) Training method for character generation model, character generating method, device, and apparatus
JP2005063055A (en) Image generating device, image generation program, and recording medium with the program stored therein
CN113177542A (en) Method, device and equipment for identifying characters of seal and computer readable medium
Jin et al. Synthesis of Chinese character using affine transformation

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP