WO2002001488A1 - Use of image detail to select and classify variably-sized pixel blocks for motion estimation - Google Patents

Use of image detail to select and classify variably-sized pixel blocks for motion estimation Download PDF

Info

Publication number
WO2002001488A1
WO2002001488A1 PCT/US2001/041183 US0141183W WO0201488A1 WO 2002001488 A1 WO2002001488 A1 WO 2002001488A1 US 0141183 W US0141183 W US 0141183W WO 0201488 A1 WO0201488 A1 WO 0201488A1
Authority
WO
WIPO (PCT)
Prior art keywords
detail
blocks
block
pyramid
image
Prior art date
Application number
PCT/US2001/041183
Other languages
French (fr)
Inventor
Alan S. Rojer
Original Assignee
Rojer Alan S
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rojer Alan S filed Critical Rojer Alan S
Priority to AU2001279269A priority Critical patent/AU2001279269A1/en
Publication of WO2002001488A1 publication Critical patent/WO2002001488A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/57Motion estimation characterised by a search window with variable size or shape
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • G06T7/238Analysis of motion using block-matching using non-full search, e.g. three-step search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • the present invention relates to computer-implemented processes and apparatus
  • Pixel block-matching is a crucial component of many processes and apparatus in
  • applications include video frame-rate conversion, temporal interpolation, and noise
  • Motion estimation is a prerequisite to exploitation of temporal redundancy in a
  • a typical strategies include successive refinement of searches from a
  • the best match may be a motion vector corresponding to either region, or
  • selected block size is influenced by the content of the block, with the block size
  • a bottom-up computation of detail 40 in the source image is used to populate an
  • an externally provided threshold 22 for determining the status of the blocks is provided.
  • the externally-provided detail threshold 22 is next utilized to build a quad-tree 90
  • terminal blocks are
  • the undivided blocks form a variably-sized tiling of the original image.
  • FIG. 1 is a block-diagram of the computations of the preferred embodiment for the
  • FIG. 2 provides the preferred embodiment of the image pyramid data structure
  • FIG. 3 displays the preferred embodiment of the pyramid datum
  • FIG. 4 displays the kernels used in construction of the image pyramid for the detail
  • FIG. 5 describes the preferred embodiment of the algorithm for detail computation
  • FIG. 6 presents the quad-tree geometry utilized for block subdivision
  • FIG. 7 displays the quad-tree datum
  • FIG. 8 displays the quad-tree data structure in the preferred embodiment
  • FIG. 9 provides the preferred embodiment of the algorithm for block subdivision
  • FIG. 1 0 is an example of the variable-sized block selection and leaf partition applied
  • the preferred embodiment of the algorithm proceeds in two main steps, with an
  • the source image 21 is processed
  • the intermediate image pyramid 50 is then processed top-down in the
  • quad-tree subdivision 60 The subdivision is controlled by the externally-supplied
  • the source image 21 is an intensity image, which is typically luma, but there is no
  • luma is preferred as an input.
  • the detail threshold 22 is a scalar.
  • the units of detail are pixel signal energy, and
  • the internal pyramid structure 50 will be examined in detail in advance of the detail
  • the image pyramid presents a collection of reduced resolution versions of a
  • the image pyramid 50 contains a scalar depth 501 , which specifies the number of layers 502, each of which is an image, 510, 51 1 , 51 2, 51 3, and 514.
  • Each image 51 0, etc, contains a
  • the structure of the image pyramid is tightly coupled to the structure of the source
  • the deepest layer of the pyramid 514 is in correspondence with the
  • source image 21 For illustrative purposes in Fig. 2, the source image has been
  • the pyramid 50 is constructed from the source image using bottom-up process of
  • non-overlapping 2 x 2 datum window is associated with a single datum in the
  • 51 2 the pyramid layer at depth 2.
  • 51 2 we have 3 x 2 datum elements from the
  • present invention incorporates detail 5101 2 and recursive detail 51 01 3 in addition
  • the signal kernel 521 and the detail kernel 522 are utilized.
  • the signal kernel 521 is the simplest low-pass
  • the detail kernel 522 is utilized to identify image detail which is likely
  • first-order (gradient) signal has been removed by the Lo-Lo, Hi-Lo, and Lo-Hi kernels
  • the local variable level is initialized to the pyramid depth.
  • the main loop is controlled by the non-zero property of the level.
  • the level is decremented. This ensures that the level in the loop will
  • the signal level e.g., 5101 1
  • detail e.g., 5101 2
  • the recursive detail (e.g., 5101 3) is initialized with the detail 5101 2, then the recursive detail of each child of the datum (if any) is added to the current datum's
  • the pyramid will contain a measure
  • Each pyramid datum corresponds to a window of pixels (a candidate
  • parent block 91 00 corresponding to a pyramid datum at level h, row i, column j,
  • the quad-tree is a convenient data
  • the quad-tree datum is also
  • a pyramid index to refer to detail and provide geometry information.
  • the quad-tree datum 91 1 is illustrated in Fig. 7.
  • the datum 91 1 provides a link
  • the orientation 91 1 2 takes one of the values NE, NW, SW, SE, except
  • 91 1 contains a pyramid index 91 1 3 which identifies the associated pyramid datum
  • the pyramid index 91 1 3 in turn contain individual indices for depth (h) 91 1 31 , row
  • children 91 1 4, if any. There are four children, 91 141 , 91 1 42, 91 143, and 91 1 44,
  • quad-tree datum is a leaf
  • each child links refers to a distinct quad-tree
  • the parent node 91 20 has been subdivided to provide children
  • the algorithm operates on the detail pyramid 50, with an externally supplied detail threshold 22
  • the algorithm makes use of an internal stack of
  • quad-tree nodes which is a last-in, first-out collection which will be familiar to
  • the stack provides push and pop operations to insert
  • the sharp node and flat_nodes are empty (601 , 602).
  • the quad-tree is
  • the root is pushed onto the stack (6051 ) . Otherwise (606), the root is
  • the main loop of the algorithm 60 (607) is based on the presence of sharp node
  • a node on the stack is either
  • the child is placed on the stack as a
  • the algorithm 60 continues until there are no subdivision candidates remaining on
  • Fig. 10 is a demonstration of the algorithm on the famous " " Lena” image, here in
  • the detail threshold used here was 1 0,000, which is larger than
  • sharp nodes are indicated with an x, flat nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

In a digital video motion estimation subsystem (1), pixel block-matching is preceded by an analysis of block detail (40) to dynamically select variably-sized pixel blocks for matching. A metric for quantification of block detail is provided. Given an externally specified level of required detail (22), blocks are recursively subdivided as long as at least one subdivision child retains sufficient detail (60). The variably-sized blocks are classified into 'sharp' (91) or high-detail blocks, and 'flat' (92) of low-detail blocks. Flat blocks are prone to spurious matches, while sharp blocks are more likely to match unambiguously or to fail to match due to occlusion or changes of scene content. Block-matching (search) resources may then be concentrated on sharp blocks.

Description

USE OF IMAGE DETAIL TO
SELECT AND CLASSIFY VARIABLY-SIZED PIXEL BLOCKS
FOR MOTION ESTIMATION
Field of the Invention
The present invention relates to computer-implemented processes and apparatus
for efficient matching of blocks of pixels between distinct frames in a video
sequence.
Background of the Invention and Description of the Prior Art
Pixel block-matching is a crucial component of many processes and apparatus in
video motion estimation. The primary application of video motion estimation is
video compression for efficient transmission over low-bandwidth channels. Other
applications include video frame-rate conversion, temporal interpolation, and noise
removal.
Motion estimation is a prerequisite to exploitation of temporal redundancy in a
digital video signal. Successive frames in a video sequence are likely to possess
substantially similar visual information content. The most widely used motion
estimation techniques in the prior art utilize matching between identically sized
blocks of pixels in temporally adjacent frames. Most matching techniques have in
common a method for evaluation of a possible match, typically using a sum of absolute pixel differences or sum of squared differences of pixel differences.
The most expensive technique for matching is brute force search of all possible
candidates, usually with some absolute limit on window size. Many block-matching
techniques utilize strategies to limit the number of possible matches to be
considered. A typical strategies include successive refinement of searches from a
wide, coarse grid of candidates to a narrow, fine grid (e.g., US. Pat. No.
5,706,059, incorporated herein by reference). Another popular strategy is to use
lower-resolution versions of the source image for coarse matching, with the results
of low-cost coarse matching serving as initialization of high-cost, high resolution
fine matching (e.g., US. Pat. No. 5,801 ,014 incorporated herein by reference).
All block matching techniques suffer from several intrinsic shortcomings. Most
significantly, there may be many " target" image blocks (at a variety of locations
in the target image) which will match a Λ * source" image block equally well. This
may be easily seen for a block which is of constant intensity throughout; such a
block will match with perfect fidelity anywhere throughout a region of said
intensity. Less obvious but in many ways more pernicious is that a block with only
a strictly linear feature (e.g., a straight edge or a constant intensity gradient) will
match ambiguously anywhere along the edge, allowing identification of only one of
two motion components. In the closely related context of optic flow estimation, it's
well-known in the prior art that only regions of sufficient Gaussian curvature (i.e.,
image detail) can unambiguous motion estimation take place. This problem has been known for many years as the aperture problem (cf., S. S. Beauchemin and J. L.
Barron, "The Computation of Optic Flow", ACM Computing Surveys, Vol. 27, No.
8, Sept. 1 995, pp. 433-467, incorporated herein by reference). Of course, as the
block size is increased, the likelihood that a block will contain sufficient detail for
an unambiguous match also increases.
Another shortcoming of block search method is their poor handling of natural
motion borders in real scenes. Rarely do such borders fall on an orderly block grid.
In a typical scene, many blocks will straddle regions of disparate motion. In these
cases, the best match may be a motion vector corresponding to either region, or
a compromise between the vectors. In this situation, the use of smaller blocks will
usually reduce the degree of overlap, permitting better resolution of the motion
edge.
In effect, conventional block matching techniques must select a block size for
matching which compromises between these incompatible constraints: larger blocks
for less matching ambiguity, smaller blocks for higher precision in detection and
representation of motion borders in the scene.
Several inventors have addressed aspects of this unpleasant tradeoff in the prior
art.
The importance of detail in block matching is acknowledged in several inventions. Astle (US. Pat. No. 6,020,926), incorporated herein by reference, teaches a
technique for restricting the error computation in a potential block match mainly to
pixels in regions of high luminance gradient, for the purpose of reducing the
expense of a block comparison. There is no concept of varying the block size to
accommodate the uneven distribution of detail around a scene, however. Reitmeier
(US Pat. No. 5,987, 1 80), incorporated herein by reference, teaches the use of
chrominance to make up for missing luma detail. Kundu (US Pat. No. 5,974, 1 92),
incorporated herein by reference, teaches a categorization of pixels according to
qualitative characteristics which recognizes the ambiguity of matching where
textural information is lacking. Closely related is Jung (US Pat. No. 5,808,685),
incorporated herein by reference, which weights error signals using local pixel
variance or gradient estimation.
The use of variably-sized search blocks may also be found in the prior art. Zhang
et al. (US Pat. No. 5,477,272), incorporated herein by reference, teaches the use
of a multi-resolution motion estimation with different block sizes at different
resolution levels, with the usual successive refinement strategy from coarse to fine.
Krause (US Pat No. 5,235,41 9), incorporated herein by reference, teaches the use
of a plurality of block sizes, evaluated in parallel, with selection of a motion vector
from the best match thereby. Similarly, Knauer et al (US Pat. No. 5, 144,423),
incorporated herein by reference, also teaches the use of a two different block
sizes, large and small, but the thrust of that invention is management of bit budget.
In none of these inventions is the signal content of the block used to guide the selection of block size.
Jung (US Pat. No. 5,561 ,475), incorporated herein by reference, teaches the use
of a variably-sized search block where the block size variation is based on
incremental growth of fixed sized blocks to contain an edge. In this invention, the
selected block size is influenced by the content of the block, with the block size
increased at one-pixel increments until the variance of the pixels in the block
exceeds a predetermined threshold. The use of variance as a measure of detail is
unsatisfactory due to the large contribution of first-order (gradient and linear edge)
features that do not permit unambiguous matching.
Restriction of matching to a subset of blocks may also be found in the prior art. Liu
and Zaccarin (US Pat. No. 5,398,068, and US Pat. No. 5,210,605), both
incorporated herein by reference, teach a method whereby only a subset of blocks
are utilized for search purposes, but the subset of blocks is chosen using an
arbitrary pattern, with no consideration of the signal content of the blocks.
Thus, the prior art, while recognizing the importance of detail in block matching,
and also identifying a variety of uses for variably sized block matching, has not yet
learned to take advantage of variable block sizes to manage the compromise
between the desire for large block sizes to ensure sufficient detail to match without
ambiguity against the desire to use the smallest usable block size for the highest
precision in motion estimation, especially at motion borders. Summary of the Invention
It is an object of the present invention to effectively mediate the competing
constraints of large block size for unambiguous matching versus small block size
for increased estimation precision.
It is another object of the present invention to conserve computational resources
by avoiding block searches that are unlikely to provide unambiguous matches.
Further objects and/or advantages of the invention will become apparent in
conjunction with the disclosure herein.
The input to the preferred embodiment of the present invention is a source image
21 , which is to provide a source of pixel blocks for matching against subsequent
or preceding frames of video.
A bottom-up computation of detail 40 in the source image is used to populate an
image pyramid 50 (P. J. Burt, The Pyramid as a Structure for Efficient Computation,
in A. Rosenfeld, Editor, Multiresolution Image Processing and Analysis,
Springer-Verlag, Berlin, 1 984, pp. 6-35, incorporated herein by reference), which
will be familiar to those skilled in the prior art. In the preferred embodiment, blocks
of pixels are required to align with the pyramid grid, so, except at the borders of the image, blocks are sized with power of two dimensions. For each cell in the pyramid,
the measure of detail is the energy in all the high-high terms in the Haar transform
for the pixels underlying the pyramid cell (E. J. Stollnitz, T. D. DeRose, and D. H.
Salesin, Wavelets for Computer Graphics, Morgan Kaufmann, San Francisco, 1 996,
incorporated herein by reference).
The preferred embodiment assigns to each block of pixels in the partition a measure
of x ' detail", which is closely correlated to the likelihood of unambiguous matching
of the blocks. In the preferred embodiment, an externally provided threshold 22 for
comparison with the computed detail" is utilized to subdivide output blocks into
" sharp" and " " flat" blocks, where sharp blocks are considered to have sufficient
detail for unambiguous matching, while flat blocks are considered to lack sufficient
detail for unambiguous matching. Further processing stages may elect to ignore flat
blocks, or devote substantially reduced effort to match evaluation of flat blocks,
thus saving computational resources.
The externally-provided detail threshold 22 is next utilized to build a quad-tree 90
in registration with the image pyramid 50. This computation 60 proceeds by
top-down recursive subdivision of blocks in the quad-tree, starting from the root,
corresponding to the whole image. As the subdivision proceeds, terminal blocks are
accumulated into collections of x sharp" blocks 91 , whose block detail exceeds
the detail threshold 22, and x flat" blocks 92, for which block detail does not
exceed the threshold 22. When all possible subdivisions have been performed, the undivided blocks form a variably-sized tiling of the original image.
Brief Description of the Drawings
A full understanding of the invention can be gained from the following description
of the preferred embodiments when read in conjunction with the accompanying
drawings in which:
FIG. 1 is a block-diagram of the computations of the preferred embodiment for the
present invention;
FIG. 2 provides the preferred embodiment of the image pyramid data structure
which is used internally for bottom-up computation of block detail;
FIG. 3 displays the preferred embodiment of the pyramid datum;
FIG. 4 displays the kernels used in construction of the image pyramid for the detail
computation;
FIG. 5 describes the preferred embodiment of the algorithm for detail computation
using the detail;
FIG. 6 presents the quad-tree geometry utilized for block subdivision; FIG. 7 displays the quad-tree datum;
FIG. 8 displays the quad-tree data structure in the preferred embodiment;
FIG. 9 provides the preferred embodiment of the algorithm for block subdivision
using the quad-tree; and
FIG. 1 0 is an example of the variable-sized block selection and leaf partition applied
to a real image.
Detailed Description of the Preferred Embodiments and the Drawings
The preferred embodiment of the algorithm proceeds in two main steps, with an
intermediate data structure, as shown in Fig. 1 . The source image 21 is processed
in the detail computation 40 in a bottom-up compution to produce an image
pyramid 50. The intermediate image pyramid 50 is then processed top-down in the
quad-tree subdivision 60. The subdivision is controlled by the externally-supplied
detail threshold 22. The products of the subdivision are the block quad-tree 90, the
leaves of which are non-overlapping variably-sized blocks of pixels, and a
classification of those blocks into x " sharp" blocks 91 whose detail is in excess of
the detail threshold 22, and " " flat" blocks 92 whose detail does not exceed the
detail threshold 22. The source image 21 is an intensity image, which is typically luma, but there is no
restriction in application of the invention to chroma or conventional red, blue, or
green channels. However, both video bandwidth and human perceptual sensitivity
are highest for luma, so luma is preferred as an input.
The detail threshold 22 is a scalar. The units of detail are pixel signal energy, and
as such they may be related to the square of the maximum intensity of the pixels
in the source image 21 . In a typical case, with pixel values ranging over 0 - 255,
detail thresholds in the range 1000 - 3000 have been found to give satisfactory
block selections for use in downstream block matching.
The internal pyramid structure 50 will be examined in detail in advance of the detail
computation 40, since the detail computation 40 populates the pyramid 50.
An image pyramid will be familiar to those skilled in the prior art. In the simplest
usage, the image pyramid presents a collection of reduced resolution versions of a
source image, with each reduced resolution image derived from the image at the
next higher level of resolution. The pyramid construction proceeds bottom-up, with
the deepest (highest resolution) level of the pyramid in registration with a source
image.
In Fig. 2, an example of the preferred embodiment of the image pyramid 50 is
presented. The image pyramid 50 contains a scalar depth 501 , which specifies the number of layers 502, each of which is an image, 510, 51 1 , 51 2, 51 3, and 514.
In Fig 2., we have fixed a depth of 5 for purposes of illustration, but of course the
depth may take arbitrary positive integral values. Each image 51 0, etc, contains a
two dimensional array, with each individual element in the array a pyramid datum
5101 , 51 1 1 , 51 21 , 51 31 , and 51 41 . The pyramid datum 51 01 etc. will be
examined in further detail in Fig. 3.
The structure of the image pyramid is tightly coupled to the structure of the source
image 21 . The deepest layer of the pyramid 514 is in correspondence with the
source image 21 . For illustrative purposes in Fig. 2, the source image has been
assumed to comprise an array of 32 x 24 pixels, but, as will be described herein,
there is no restriction placed on the dimensions of the source image 21 .
The pyramid 50 is constructed from the source image using bottom-up process of
coalescence with optional augmentation. In the coalescence process, each
non-overlapping 2 x 2 datum window is associated with a single datum in the
succeeding layer. The coalesced elements are denoted children while the single
datum in the succeeding layer is denoted the parent. Children may be referenced
by offset from the parent using a compass notation {NE, NW, SW, SE}
corresponding to the quadrant of the parent occupied by each child. Any element
in the pyramid may be accessed at random in constant time by the use of three
indices h, i, and j, which specify the depth in the pyramid, and the row and column
in the layer, respectively. The three indices collectively will be denoted a " " pyramid index" 91 1 3 (Fig. 7) .
Since coalescence requires a 2 x 2 datum window, an odd-sized image is
augmented by replication of first or last row or column, as necessary. This process
is illustrated in the construction of 51 1 , the pyramid layer at depth 1 in Fig. 2, from
51 2, the pyramid layer at depth 2. In 51 2, we have 3 x 2 datum elements from the
coalescence of layer 3 51 3. The 3 x 2 datum elements are augmented by
duplication of the top row of elements.
For maximum efficiency in computation, the preferred embodiment of the pyramid
50 utilizes the source image 21 as its deepest layer, unless the source image 21
is odd-sized and requires augmentation. The detail and recursive detail of the
deepest layer, whether it be the source image 21 or an augmented copy thereof,
is defined to have zero detail and zero recursive detail for the purposes of the detail
computation. Note also that elements in the deepest layer of the pyramid have no
children.
In addition to image data providing a series of reduced-resolution versions of the
source image, it is often convenient to associate other data with each element of
the pyramid. In Fig. 3, a pyramid datum 51 01 in the preferred embodiment of the
present invention incorporates detail 5101 2 and recursive detail 51 01 3 in addition
to the usual signal level 5101 1 (image intensity). To populate the pyramid 50 in the preferred embodiment of the present invention,
two kernels are utilized. The signal kernel 521 and the detail kernel 522 are
depicted in Fig. 4. Those skilled in the prior art will recognize that these kernels
correspond to the Lo-Lo and Hi-Hi kernels of the Haar transform, the oldest and
simplest of wavelet transforms. The signal kernel 521 is the simplest low-pass
filter; it is applied here to the construction of reduced resolution copies of the
source image. The detail kernel 522 is utilized to identify image detail which is likely
to match unambiguously. It represents the remaining signal after constant and
first-order (gradient) signal has been removed by the Lo-Lo, Hi-Lo, and Lo-Hi kernels
in the decomposition. Since the constant and first-order signals are prone to
ambiguous matching, the restriction of the detail measure to Hi-Hi is a major
contribution to this invention.
The bottom-up computation of detail 40 using the image pyramid 50 is depicted in
Fig. 6. This computation proceed from the lowest level of the pyramid to the
pyramid's root. At 401 , the local variable level is initialized to the pyramid depth.
At 402, the main loop is controlled by the non-zero property of the level. Inside the
loop at 4021 , the level is decremented. This ensures that the level in the loop will
always lie between 0 and level-1 inclusively. At 4022, each datum in the current
pyramid level is considered. The signal level (e.g., 5101 1 ) and detail (e.g., 5101 2)
for the current datum is computed in 40221 and 40223 by inner product of the
signal level of the children with the signal 521 and detail 522 kernels, respectively.
The recursive detail (e.g., 5101 3) is initialized with the detail 5101 2, then the recursive detail of each child of the datum (if any) is added to the current datum's
recursive detail in 402241 .
At the conclusion of the detail computation 40, the pyramid will contain a measure
of signal, detail and recursive detail for each datum from the base (51 41 , etc) to
the root 5101 . Each pyramid datum corresponds to a window of pixels (a candidate
block) as well as a geometric region in the image. The detail computation proceeded
from the bottom-up, working from the pixels in the source image 21 up to the root
of the pyramid 50, layer by layer. The algorithm now proceeds from top down,
beginning from the root 5101 .
The crucial supporting process in the selection of blocks is subdivision of a block
or window into four equally sized, non-overlapping children, occupying the same
area as the original block. The subdivision of a block is illustrated in Fig. 6. The
parent block 91 00, corresponding to a pyramid datum at level h, row i, column j,
with geometric bounding box ( u , v , u + delta , v + delta ), is subdivided into
four children, 91 01 , 91 02, 9103, with pyramid indices and geometry as shown.
Initially only the root is available for subdivision. When a parent is subdivided, its
children become candidates for subdivision. The quad-tree is a convenient data
structure for the representation and management of this process. The quad-tree will
be familiar to those skilled in the prior art; it provides at a minimum a link from a
parent quad to its children and typically a link back from child to parent. In the preferred embodiment of the present invention, the quad-tree datum is also
provided with a pyramid index to refer to detail and provide geometry information.
The quad-tree datum 91 1 is illustrated in Fig. 7. The datum 91 1 provides a link
91 1 1 to its parent, which is 0 in case the datum 91 1 is the root of the quad-tree.
Also, the orientation 91 1 2 of the child amongst the parent's children is retained in
the datum. The orientation 91 1 2 takes one of the values NE, NW, SW, SE, except
in the case of the root, where the orientation is undefined. The quad-tree datum
91 1 contains a pyramid index 91 1 3 which identifies the associated pyramid datum,
and hence provides a source for detail information as well as geometric information.
The pyramid index 91 1 3 in turn contain individual indices for depth (h) 91 1 31 , row
(i) 91 1 32, and column (j) 91 133. Finally, the quad-tree datum contains links to its
children 91 1 4, if any. There are four children, 91 141 , 91 1 42, 91 143, and 91 1 44,
corresponding to the orientations NE, NW, SW, SE. If the quad-tree datum is a leaf,
the child links will be 0. Otherwise, each child links refers to a distinct quad-tree
datum. As an illustration of the quad-tree linkages, a quad-tree data structure after
subdivision of an arbitrary quad-tree datum associated with pyramid index h , i , j,
as shown in Fig. 8. The parent node 91 20 has been subdivided to provide children
9121 , 91 22, 91 23, and 91 24. The parent and orientation of the parent node 91 20
are not shown as they refer to elements outside of the figure.
With the specification of the quad-tree, The detailed subdivision algorithm 60 is
presented in Fig. 9. This algorithm will provide the block selections quad-tree 91 ,
and the collections of sharp and flat blocks, 92 and 93, respectively. The algorithm operates on the detail pyramid 50, with an externally supplied detail threshold 22
to control the subdivision process. The algorithm makes use of an internal stack of
quad-tree nodes, which is a last-in, first-out collection which will be familiar to
those skilled in the prior art. The stack provides push and pop operations to insert
and remove elements. An alternative embodiment could make use of a recursive
algorithm to obviate the direct use of the stack, possibly with a slight loss of
efficiency. The algorithm also assumes a constructor for quad tree nodes, indicated
by newquad tree node, which requires as arguments the parent quad-tree node and
the pyramid index which is to be associated with the new node.
Initially, the sharp node and flat_nodes are empty (601 , 602). The quad-tree is
initialized to a single node, corresponding to the root of the pyramid (603, 604).
The detail associated with the root is then consulted in 605. If the root node's
detail is in excess of the detail threshold 22, as will typically, but not always, be
the case, the root is pushed onto the stack (6051 ) . Otherwise (606), the root is
added to the flat file collection (6061 ).
The main loop of the algorithm 60 (607) is based on the presence of sharp node
candidates on the stack. No node is placed on the stack unless the detail associated
with its pyramid datum exceeds the threshold. Hence, a node on the stack is either
sharp, or will be subdivided to yield one or more descendent sharp nodes. Thus,
while there are candidates on the stack (607), the algorithm takes a candidate node
(6071 ). Initially, the algorithm assumes the node will not be subdivided (6072). The children of the candidate node are examined in a loop at 6073. The detail
associated with each child is compared to the detail threshold 22 (60731 ). If a child
is found with detail in excess of the threshold (60731 1 ), the subdivision flag is
raised (60731 1 ) and the scan of the children is aborted (60731 2) .
If the subdivison flag was raised (6074), the children are scanned again (60741 ),
and a new quad-tree node is created for each child (60741 1 ). The detail
assocatiated with the child is compared to the detail threshold 22 (60741 2). If
detail in excess of the threshold is found, the child is placed on the stack as a
subdivision candidate (60741 21 ). Otherwise (60741 3), the child is added to the
collection of flat nodes (60741 31 ).
If the subdivision flag was not raised (6075), the node is added to the collection of
sharp nodes (60751 ) .
The algorithm 60 continues until there are no subdivision candidates remaining on
the stack.
Fig. 10 is a demonstration of the algorithm on the famous " " Lena" image, here in
51 2x51 2 luma. The detail threshold used here was 1 0,000, which is larger than
usual (1 000-3000), but makes for a better illustration. The tiling shown illustrates
the selected blocks for matching; sharp nodes are indicated with an x, flat nodes
are left empty. Having described tins invention with regard to specific embodiments, it is to be understood that the description is not meant as a limitation since further variations or modifications may be apparent or may suggest themselves to those skilled in the art. It is intended that the present application cover such variations and modifications as fall within the scope of the appended claims.
In addition to the disclosure of the inventions provided herein, several additional
references may be of interest to those of ordinary skill and useful for additional
background and information of relevance. These references include:
1 . S. S. Beauchemin and J. L. Barron, "The Computation of Optic Flow", ACM
Computing Surveys, Vol. 27, No. 8 (Sept. 1 995), pp. 433-467.
2. P. J. Burt, The Pyramid as a Structure for Efficient Computation, in A.
Rosenfeld, Editor, Multiresolution Image Processing and Analysis,
Springer-Verlag, Berlin, 1 984, pp. 6-35.
3. E. J. Stollnitz, T. D. DeRose, and D. H. Salesin, Wavelets for Computer
Graphics, Morgan Kaufmann, San Francisco, 1 996.

Claims

What is claimed is:
1 . A method for selection and classification of variably-sized pixel blocks in an
source image which balances the competing constraints of increasing block size for
unambiguous matching against decreasing block size for accuracy in computation
of motion fields, the method comprising the steps of:
bottom-up computing of a measure of detail for each candidate pixel block from a
source image;
under control of an externally provided detail threshold, top-down splitting of
candidate pixel blocks, said top-down splitting is performed as long at least one of
the pixel blocks resulting from the split has detail in excess of the detail threshold;
and
classifying the pixel blocks split in the preceding step according to whether said
measure of detail in each of said blocks exceeds said externally provided detail
threshold.
2. The method of claim 1 , wherein said bottom-up computing of said measure of
detail utilizes a recursive sum of squared hi-hi Haar coefficients.
3. The method of claim 1 , wherein said step of bottom-up computing is performed
using an image pyramid.
4. The method of claim 3, wherein said top-down splitting is performed using a
quad-tree associated with said image pyramid from claim 3, such that leaves of said
quad-tree correspond to said split pixel blocks.
5. The method of claim 1 , wherein said classifying of said split pixel blocks
according to said measure of detail is embodied in a sharp blocks collection and a
flat blocks collection, wherein said sharp blocks collection contains blocks with said
measure of detail in excess of said detail threshold, and said flat blocks collection
contains blocks with said measure of detail not in excess of said detail threshold.
PCT/US2001/041183 2000-06-26 2001-06-26 Use of image detail to select and classify variably-sized pixel blocks for motion estimation WO2002001488A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001279269A AU2001279269A1 (en) 2000-06-26 2001-06-26 Use of image detail to select and classify variably-sized pixel blocks for motion estimation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US60322000A 2000-06-26 2000-06-26
US09/603,220 2000-06-26

Publications (1)

Publication Number Publication Date
WO2002001488A1 true WO2002001488A1 (en) 2002-01-03

Family

ID=24414530

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/041183 WO2002001488A1 (en) 2000-06-26 2001-06-26 Use of image detail to select and classify variably-sized pixel blocks for motion estimation

Country Status (2)

Country Link
AU (1) AU2001279269A1 (en)
WO (1) WO2002001488A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011056601A3 (en) * 2009-11-06 2011-06-30 Qualcomm Incorporated Control of video encoding based on image capture parameters
US20120176536A1 (en) * 2011-01-12 2012-07-12 Avi Levy Adaptive Frame Rate Conversion
US8837576B2 (en) 2009-11-06 2014-09-16 Qualcomm Incorporated Camera parameter-assisted video encoding
CN113283442A (en) * 2020-02-19 2021-08-20 北京四维图新科技股份有限公司 Feature point extraction method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5396567A (en) * 1990-11-16 1995-03-07 Siemens Aktiengesellschaft Process for adaptive quantization for the purpose of data reduction in the transmission of digital images
US5446806A (en) * 1993-11-15 1995-08-29 National Semiconductor Corporation Quadtree-structured Walsh transform video/image coding
US5666475A (en) * 1995-01-03 1997-09-09 University Of Washington Method and system for editing multiresolution images at fractional-levels of resolution using a wavelet representation
US6236761B1 (en) * 1998-07-31 2001-05-22 Xerox Corporation Method and apparatus for high speed Haar transforms

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5396567A (en) * 1990-11-16 1995-03-07 Siemens Aktiengesellschaft Process for adaptive quantization for the purpose of data reduction in the transmission of digital images
US5446806A (en) * 1993-11-15 1995-08-29 National Semiconductor Corporation Quadtree-structured Walsh transform video/image coding
US5666475A (en) * 1995-01-03 1997-09-09 University Of Washington Method and system for editing multiresolution images at fractional-levels of resolution using a wavelet representation
US6236761B1 (en) * 1998-07-31 2001-05-22 Xerox Corporation Method and apparatus for high speed Haar transforms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
STROBACH P.: "Quadtree-structured linear prediction models for image sequence processing", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 11, no. 7, July 1989 (1989-07-01), pages 742 - 748, XP002947626 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011056601A3 (en) * 2009-11-06 2011-06-30 Qualcomm Incorporated Control of video encoding based on image capture parameters
CN102598665A (en) * 2009-11-06 2012-07-18 高通股份有限公司 Control of video encoding based on image capture parameters
US8837576B2 (en) 2009-11-06 2014-09-16 Qualcomm Incorporated Camera parameter-assisted video encoding
US10178406B2 (en) 2009-11-06 2019-01-08 Qualcomm Incorporated Control of video encoding based on one or more video capture parameters
US20120176536A1 (en) * 2011-01-12 2012-07-12 Avi Levy Adaptive Frame Rate Conversion
CN113283442A (en) * 2020-02-19 2021-08-20 北京四维图新科技股份有限公司 Feature point extraction method and device

Also Published As

Publication number Publication date
AU2001279269A1 (en) 2002-01-08

Similar Documents

Publication Publication Date Title
Ojala et al. Texture discrimination with multidimensional distributions of signed gray-level differences
US7006714B2 (en) Image retrieval device, image retrieval method and storage medium storing similar-image retrieval program
Gianluigi et al. An innovative algorithm for key frame extraction in video summarization
Valkealahti et al. Reduced multidimensional co-occurrence histograms in texture classification
Takala et al. Block-based methods for image retrieval using local binary patterns
Tao et al. Texture recognition and image retrieval using gradient indexing
Kwok et al. A fast recursive shortest spanning tree for image segmentation and edge detection
JP5097280B2 (en) Method and apparatus for representing, comparing and retrieving images and image groups, program, and computer-readable storage medium
Moghaddam et al. A new algorithm for image indexing and retrieval using wavelet correlogram
JP5117670B2 (en) Image and method for representing image group, method for comparing image or image group, method for retrieving image or image group, apparatus and system for executing each method, program, and computer-readable storage medium
US9589323B1 (en) Super resolution image enhancement technique
US8463050B2 (en) Method for measuring the dissimilarity between a first and a second images and a first and second video sequences
EP1831823A1 (en) Segmenting digital image and producing compact representation
Acharyya et al. Extraction of features using M-band wavelet packet frame and their neuro-fuzzy evaluation for multitexture segmentation
WO2002001488A1 (en) Use of image detail to select and classify variably-sized pixel blocks for motion estimation
EP1640913A1 (en) Methods of representing and analysing images
Bhalod et al. Region-based indexing in the JPEG2000 framework
Popat Conjoint probabilistic subband modeling
Valle et al. Content-based retrieval of images for cultural institutions using local descriptors
Piamsa-nga et al. Multi-feature content based image retrieval
Valkealahti et al. Reduced multidimensional histograms in color texture description
Piamsa-nga et al. A unified model for multimedia retrieval by content.
Lubbad et al. Cosine-based clustering algorithm approach
Zhang et al. Color quantization of digital images
Bao et al. Image retrieval based on multi-scale edge model

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
NENP Non-entry into the national phase

Ref country code: JP