WO2000055757A1 - A fast multiplierless transform - Google Patents

A fast multiplierless transform Download PDF

Info

Publication number
WO2000055757A1
WO2000055757A1 PCT/US2000/006941 US0006941W WO0055757A1 WO 2000055757 A1 WO2000055757 A1 WO 2000055757A1 US 0006941 W US0006941 W US 0006941W WO 0055757 A1 WO0055757 A1 WO 0055757A1
Authority
WO
WIPO (PCT)
Prior art keywords
lifting
transform
coefficients
processing
computer readable
Prior art date
Application number
PCT/US2000/006941
Other languages
French (fr)
Inventor
Trac D. Tran
Original Assignee
The Johns Hopkins University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Johns Hopkins University filed Critical The Johns Hopkins University
Priority to AU37519/00A priority Critical patent/AU3751900A/en
Priority to EP00916411A priority patent/EP1311975A1/en
Publication of WO2000055757A1 publication Critical patent/WO2000055757A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/147Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform

Definitions

  • each block is an array of 64 pixels: 8 pixels in height and 8 pixels
  • the DCT transforms the blocks of pixels from the spatial
  • the DCT is a robust
  • a forward binDCT (e.g., for
  • each of the DCT's is identical to each of the DCT's.
  • rotation angles e.g., 14 of Fig. 1 are replaced with a minimal number of dyadic
  • FIG. 9 depicts an embodiment of an inverse binDCT which employs inverted lifting steps having dyadic lifting
  • the second lifting step is B and the input for the upper branch of the second lifting
  • each embodiment of the binDCT still employs the following ⁇ 1 butterfly
  • each inverted binDCT may be employed within a respective synthesis block
  • the binDCT can map integers to integers with exact reconstruction. It
  • binDCT may be approximately 0.1 - 1.0 dB below the DCT in the signal-to-noise ratios of the reconstructed images in certain examples.
  • the binDCT may be found

Abstract

A preferred embodiment of the invention provides an M x M multiplierless perfect reconstruction block transform (i.e., perfect reconstruction is realized if the invention is utilized for both analysis and synthesis, otherwise, if the invention is used on only one end, near-perfect reconstruction results) with symmetric/antisymmetric basis functions, i.e., linear phase filter. In a preferred embodiment, a cascade of ±1 butterfly stages (12) is provided followed by two invertible matrices. Each invertible matrix is made up of a cascade of lifting steps (16) and scaling factors (αi). The scaling factors can be folded into the quantization stepsizes. A forward binDCT (e.g., for signal analysis), can be implemented using a cascade of ±1 butterflies (12) and dyadic lifting steps (16). Furthermore, an inverse binDCT (e.g., for signal synthesis can be easily realized using lifting steps (16) of reverse order and inverted polarity, cascaded with ±1 butterflies (12). In accordance with an embodiment of the invention, all of the lifting coefficients have been chosen to be dyadic rational numbers. Each dyadic lifting step can be constructed by a simple combination of shift-and-add operations.

Description

A FAST MULTIPLIERLESS TRANSFORM
This application claims the benefit of U.S. Provisional Patent Application No.
60/124,746, filed on March 17, 1999, which is incorporated herein by reference in
its entirety.
BACKGROUND OF THE INVENTION
Mathematical transforms (e.g., the fourier transform, the discrete cosine
transform, the wavelet transform, etc.) have found application in signal processing
computer algorithms, particularly the compression and coding of speech, audio,
image and video data. Such transforms are used in the currendy popular image
compression standard referred to as "JPEG" developed by the Joint Photographic
Experts Group, as well as in high-performance video coding standards such as MPEG
and H.263. These operate by dividing up an image or video into sections or blocks,
performing transforms on the data in the blocks, and compressing or coding the
coefficients resulting from the transform.
The JPEG standard, for example, operates by dividing an image into 8x8
blocks of pixels (i.e., each block is an array of 64 pixels: 8 pixels in height and 8 pixels
in width). A mathematical transform known as the "discrete cosine transform
(DCT)" is calculated for each block in the image to produce a data set of coefficients. A variable length code-based compression technique is then used to compress the
data set and store the compressed data stream in an output file. The output file can
then be decompressed to recover the DCT coefficients, the inverse DCT transform
calculated, and the resulting blocks of pixels displayed to represent the original form
of the image.
Mathematically, the DCT transforms the blocks of pixels from the spatial
representation to the frequency representation. In the frequency representation, the
image or video information can be divided into frequency components, some of
which are more important than others. The compression algorithm selectively
quantizes or discards the frequency components that do not adversely affect the
reconstructed picture contents. In this manner, compression is achieved.
From a statistical signal processing standpoint, the DCT is a robust
approximation to the optimal discrete-time Karhunen-Loeve transform (KLT) of a
first-order Gauss-Markov process with a positive correlation coefficient p when p —»
1. The KLT is optimal in the sense that it provides the most accurate representation
of a signal by producing transform coefficients that are the eigen values of the input's
autocorrelation matrix, and as known in the art, the eigen values are the optimum
achievable result. In other words, the KLT is optimal in the energy compaction
sense, i.e., among unitary transforms, the KLT packs signal energy into the fewest
number of coefficients. However, the KLT is signal-dependent, therefore, it is computationally complex and expensive to implement. The DCT has proven to be a
much better alternative in practice in that it is signal independent, has linear phase,
has real coefficients, and can be implemented using fast algorithms.
The DCT coefficients, X[0] through X[M-1], of an M x M transform (i.e., a
transform having M2 individual signal samples at its input to be processed row by row
as M basis functions) and the input image pixels, x[0] through x[M-l], are related by
the following equations:
M-\
Tim
X[m] = a ^ x[n] cos (2« + 1) 0 < m < M - 1, n=0 2M
with
Figure imgf000005_0001
It has a closed-form expression and it is orthogonal. Orthogonality is desirable in a
transform inasmuch as there is no redundancy among the M transform bases (e.g., they are orthogonal to each other). In the matrix format, we can represent the
transformation of an input, x, to a transform coefficient, X, as X = Px, and the
reconstruction of the transform coefficient back into the input x as x = PrX, where P
is the DCT transform matrix and Pτ is its inverse transform matrix. Exploiting the
symmetry of the coefficients cos [(πm/2M) (2n + 1)] in the equation above, the
functions performed by the DCT transform matrix P can be represented by a series of
±1 butterflies 12 and rotation angles 14, as illustrated in Fig. 1. The ±1 butterflies
are a series of crossings which process one pair of inputs at a time (i.e., x[0] through
x[7]) by adding and subtracting the inputs in pairs. For example, the first butterfly
encountered by the data set x[0] through x[7] shown in Fig. 1 operates on the data
pair x[0], x[7]. In the top branch of the first butterfly, the term x[0] is added to the
term x[7], while in the bottom branch, the term x[7] is subtracted from x[0].
Similarly, in the second butterfly, the term x[l] is added to the term x[6], while in
the bottom branch the term x[6] is subtracted from x[l]. In an 8 x 8 transform, the
DCT initially performs four such butterflies (i.e., crossings) for a total of four discrete
addition operations and four discrete subtraction operations. The DCT rotation
angles are a series of multiplication operations carried out on the results achieved by
the ±1 butterflies. As depicted in Fig. 1, "C π/4" signifies a multiplication operation
(i.e., multiplying an input by cosine π/4). Similarly, "S π/4" signifies a multiplication
operation (i.e., multiplying an input by sine π/4). This process results in a fast DCT implementation. Using this method, eight DCT coefficients N[0] - X[7] can be
computed using 13 multiplication operations and 29 addition operations.
Despite all of the above advantages of the DCT, there is still room for
improvement. For example, the DCT is a floating-point transform, i.e., the cos and
sin multipliers in the 5 rotation angles {π/4, 3π/8, π/4, 7π/16, 3π/16} are real
numbers with infinite precision. Hence, the DCT cannot map integers to integers
losslessly inasmuch as the product of the inputs, as processed by the ±1 butterflies,
and the DCT rotation angles result in floating-point outputs that cannot be
represented in binary form without truncation, thereby resulting in the loss of input
data. More importantly, floating-point implementations are relatively slow, require
too much hardware space, and consume too much power.
To combat these problems, most implementations of the DCT are based on
integer arithmetic by scaling up the floating-point multipliers by very large factors
(e.g., multiplying by a factor of 1000 or greater to rid the calculation of floating
decimal points). Unfortunately, however, this ad-hoc approximation method still has
problems; for instance, it does not reduce the complexity of the circuit and it still has
truncation errors. Moreover, if more computational accuracy is needed, the scaling
factors must be very large, thereby adding more complexity to the implementation. SUMMARY OF THE INVENTION
A preferred embodiment of the invention provides an M x M multiplierless
perfect reconstruction block transform (i.e., perfect reconstruction is realized if the
invention is utilized for both analysis and synthesis, otherwise, if the invention is used
on only one end, near-perfect reconstruction results) with symmetric/antisymmetric
basis functions, i.e., linear phase filters. In a preferred embodiment, a cascade of ±1
butterfly stages is provided followed by two invertible matrices. Each invertible
matrix is made up of a cascade of lifting steps and scaling factors (αt). The scaling
factors can be folded into the quantization stepsizes. A forward binDCT (e.g., for
signal analysis), can be implemented using a cascade of ±1 butterflies and dyadic
lifting steps. Furthermore, an inverse binDCT (e.g., for signal synthesis) can be easily
realized using lifting steps of reverse order and inverted polarity, cascaded with ±1
butterflies.
In accordance with an embodiment of the invention, all of the lifting
coefficients have been chosen to be dyadic rational numbers. Each dyadic lifting step
can be constructed by a simple combination of shift-and-add operations. BRIEF DESCRIPTION OF THE DRAWINGS
These and other advantages, features, and applications of the invention will be
apparent from the following detailed description of the invention which is provided
in connection with the accompanying drawings in which:
Fig. 1 is a signal flow graph which depicts a conventional fast implementation
of an 8 x 8 floating-point DCT;
Figs. 2(a)-2(c) are signal flow graphs which depict variations on lifting steps
used in the analysis and synthesis of data samples;
Fig. 3 is a signal flow graph which depicts a general solution in accordance
with the invention for a perfect reconstruction block transform whose basis functions
have linear phase;
Fig. 4 is a signal flow graph which depicts how an invertible matrix U can be
decomposed into a cascade of lifting steps and scaling factors (α.j) in accordance with
a preferred embodiment of the invention;
Figs. 5-11 depict a forward binDCT in accordance with preferred
embodiments of the invention; Fig. 12 depicts frequency responses of a binDCT in accordance with a
preferred embodiment of the invention;
Fig. 13 depicts a transform and inverse transform matrix of the first
embodiment of a binDCT;
Fig. 14 depicts peak signal-to-noise ratios (PSNR) realized for two
embodiments of the binDCT; and
Fig. 15 depicts a simplified block diagram of a transmission system
incorporating analysis and synthesis filters which employ a binDCT block transform
in accordance with preferred embodiments of the invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Preferred embodiments and applications of the invention will now be
described with reference to Figs. 2-15. Other embodiments may be realized and
structural or logical changes may be made to the disclosed embodiments without
departing from the spirit or scope of the invention. Although the invention is
particularly described as applied to the analysis and synthesis of image and video data
samples, it should be readily apparent that the invention may be applied to any other
data samples or sets having the same or similar problems.
Instead of factoring the DCT into a cascade of rotation angles, as in
conventional practice in the art, the invention utilizes a series of lifting steps (also
known as "shear" operations or "ladder" structures). The basic principle behind a
lifting step is demonstrated in the signal flow graphs depicted in Figs. 2(a)-2(c). In
accordance with a preferred embodiment of the invention, the denominator of the
lifting coefficient /, is deliberately chosen to be dyadic in that it is a power of 2 (i.e., a
rational number that can be written in the form of l/2m, k, m e -Z) for easy binary
implementation. Fig. 2(b) shows that the lifting step can achieve perfect
reconstruction of a signal even with the presence of a nonlinear truncation operator
such as a "floor" (which rounds down to the nearest integer), a "round" (which
rounds either up or down to the nearest integer), or a ceiling" (which rounds up to the nearest integer). Fig. 2(c) illustrates how this property is exploited to implement
the lifting step very efficiently using a combination of addition operations and bit-
shift operations. The method is always feasible as long as the lifting coefficient is
dyadic. For example, a lifting coefficient of 3/8 signifies "multiplying" an input
value by 3 followed by a "division" by 8 implemented by a right-shift of 3 binary
places (i.e., 23 = 8 in the denominator). It should be noted that the term
"multiplying" is actually inaccurate since a series of bit-shift and addition operations
are performed upon the signal to achieve the same result without performing the
actual multiplication operation.
As known in the art, binary multiplication may be simplified through a series
of bit-shift and addition operations. Generally, in order to implement a
multiplication operation, a multiplicand (i.e., a number that is to be multiplied by
another) is multiplied by a multiplier (i.e., a number by which a multiplicand is to be
multiplied). When multiplying binary numbers, bit-shift and addition operations
may be implemented to effectively perform a multiplication without the same level of
complexity required for multiplication.
A general description of binary bit-shift and addition operations, as substitutes
for multiplication is found in U.S. Patent No. 3,691,359 to Dell et al., which is
incorporated herein by reference. For example, the numerator 3 of the lifting coefficient 3/8 signifies
performing the following steps: a) a left-shift of 1 binary place upon the input signal
to the lifting step; and b) after a left-shift of one place is performed (e.g., the decimal
value 5 is represented as 101 in binary; after a left-shift, 101 becomes 1010), the
result is then added to the original value of the input signal (e.g., 101), resulting in a
lifting step output of 1111 binary (or decimal value 15). A multiplication by 3 has
just been performed with just one bit-shift and one addition operation. In
accordance with a preferred embodiment of the invention, each of the DCT's
rotation angles (e.g., 14 of Fig. 1) are replaced with a minimal number of dyadic
lifting steps, yet the resulting transform still retains most of the DCT's superior
performance.
The DCT and the novel transformation referenced herein as "binDCT" are
both M x M block transforms. For purposes of illustration only, an 8 x 8 block will
be described herein. At the forward transform in the encoder, 8 input samples x[0] -
x[7] are processed at a time, yielding 8 transform coefficients X[0] - X[7] that can
then be quantized and encoded (e.g., entropy encoded) in a manner known in the
art. In the decoder, the 8 encoded transform coefficients are entropy decoded,
inverse quantized, and then fed to the inverse transform module. For ID signals
such as speech and audio, the input stream can be divided into 8 -sample segments,
each is then processed independently. For 2D and 3D signals (images and videos
respectively), the input can be segmented into non-overlapped 8 x 8 blocks or 8 x 8 x 8 cubes. Each block or cube can be processed independently in separable fashion.
For example, the ID transformation can be applied to every row, then to every
column, to obtain a 2D transformation.
As depicted in Figs. 5-8, each of which depicts a preferred embodiment of the
invention, the 8 input samples labeled from Λ;[0] to x[7] are first processed in four
pairs of two by the ±1 butterflies 12. Each of the four ±1 butterflies 12 takes a pair of
data samples and computes their sum as well as their difference.
As depicted in Fig. 3, each of the resulting sums may be arranged in the 4
upper branches Ul, U2, U3, U4, which will be processed by invertible matrix U to
produce the 4 even-indexed transform coefficients X[0], X[2], X[4], X[6] associated
with the transform's 4 symmetric basis functions. Likewise, all of the resulting
differences are fed into the 4 lower branches VI, V2, V3, V4, that will be further
transformed by invertible matrix V to produce the 4 odd-indexed transform
coefficients X[l], X[3], X[5], X[7] associated with the transform's 4 antisymmetric
basis functions.
In accordance with an embodiment of the invention, the outputs of the
butterflies are processed significantly different by the binDCT, as compared with the
DCT. For instance, whereas the DCT relies on rotation angles (e.g., 14 of Fig. 1)
which are depicted below in matrix form as cos#, - sin#, cos θ, sin θt or sinfi cos# sin#, - cos#
the binDCT, in accordance with a preferred embodiment of the invention, relies on
lifting steps, or cascades of shears (e.g., 16 of Figs. 5-8), a general matrix
representation of which (to be described more fully below) is depicted below as
1 /, 0 1 /, + ι 1
Since a rotation angle is orthogonal, its inverse is simply its transpose, e.g.,
cos#, - sin6> T cos#, - sin6> cos θ, sin θ, cos θ, - sin6ϊ 1 0 sin#, cos6> sin θ, cos θ, - sin#, cos#, sin θ, cos θ, 0 1
Inverting a lifting step is achieved simply by inverting its polarity, i.e.
Figure imgf000015_0002
Figure imgf000015_0001
This is always true as long as the diagonal elements of the matrix are unity with the
same polarity (both +1 or -1). In the case that the diagonal elements have opposite
polarity, the inverse of the lifting step is simply itself, i.e.,
Figure imgf000016_0001
Therefore, transforms constructed from series of lifting steps are still perfectly
invertible. Since the multiplication operations associated with the DCT rotation
angles require floating-point arithmetic, and since it is very difficult to choose
rotation angles that yield integer multiplications, in accordance with a preferred
embodiment of the invention, the orthogonality associated with the DCT may be
sacrificed in the binDCT in favor of lower complexity. Specifically, lifting steps are
used to replace the DCT's rotation angles, preferably having the denominator of the
lifting coefficients /, chosen to be dyadic, both the forward and the inverse transform
can be implemented without using any multiplication (or division) arithmetic
operations, particularly floating point multiplication. Fig. 9 depicts an embodiment of an inverse binDCT which employs inverted lifting steps having dyadic lifting
coefficients.
In accordance with another aspect of the invention, specific choices of lifting
coefficients /, were made to result in high-performance transforms.
In each of the 4 upper branches of the binDCTs depicted within Figs. 5-8, the
DCT's 2 butterflies (12a as in Fig. 1) are retained; however, the two upper branch
DCT rotation angles of π/4 and 3π/8 can be replaced with one of the following sets
of lifting steps depicted below in matrix form.
Figure imgf000017_0001
For example, the upper branch DCT rotation angle of π/4 (14a as seen in
Fig. 1) is replaced with a pair of lifting steps 16a (of Figs. 5-8). For illustrative
purposes, consider an input A on the upper branch of the first lifting step of 16a and
an input of B on the lower branch of the first lifting step of 16a. Consequently, the
output on the upper branch of the first lifting step is A+B and the output on the
lower branch of the first lifting step is B. Similarly, the input on the lower branch of
the second lifting step is B and the input for the upper branch of the second lifting
step is A+B. Consequently, the output for the upper branch of the second lifting
step is A+B and the output for the lower branch of the second lifting step is V2(A+B)-
B. In accordance with a preferred embodiment of the invention, these lifting steps can be implemented in software and/or hardware using a series of binary shifts and
adds on the input data to produce the appropriate transform coefficients (e.g.,
X[0]...X[7]) representative of the original image data samples.
While any one of the above 3 matrix sets (or other equivalent sets apparent to
those of ordinary skill in the art) may be employed to obtain these output results
(albeit, with differing levels of accuracy), the first matrix set set forth above will be
derived, by way of example, as a replacement for the upper branch π/4 rotation angle
in Fig. 1.
One way in which respective outputs of A+B and B for the upper and lower
branches of the first lifting step may be obtained is to multiply the input vector
A B
by a matrix which represents the first lifting step of 16a:
1 1
0 1 Performing matrix multiplication results in the respective upper and lower branch
outputs of A+B and B in the first lifting step.
One way in which respective outputs of A+B and V2(A+B)-B for the upper and
lower branches of the second lifting step of 16a may be obtained is to multiply the
vector input to the second lifting step
A + B B
by a matrix which represents the second lifting step of 16a:
1 0
1/2 -1
Using a method similar to that described above for the upper branch DCT
rotation angle of π/4 in the graph of Fig. 1, the upper branch DCT rotation angle of
3π/8 (14b in Fig. 1) may be replaced with the following sets of lifting steps (or
equivalent lifting steps apparent to those skilled in the art) depicted below in matrix
form: 3τv_ 8
Figure imgf000020_0002
In each of the 4 lower branches of the binDCTs, depicted within Figs. 5-8,
which yield the odd-indexed transform coefficients (i.e., X[l], X[3], X[5], X[7]),
three DCT rotation angles (i.e., π/4 (14d), 7π/16 (14e), and 3π/16 (14f, as shown
in Fig. 1 ) are also replaced with lifting steps, particularly those lifting steps having
dyadic denominators in their lifting coefficients. It should be noted that special care
must be taken in choosing a lifting coefficient to replace rotation angle π/4 (14d) in
the lower branches since it precedes the remaining angles in the structure. The
following choices of lifting steps (depicted in matrix form) are preferable to yield
better coding performances although other equivalent lifting steps may be apparent
to those skilled in the art:
π . -1 5/8 1 0 -1 3/8 1 0 1 -7/16 -1 7/16 1 0 1 -7/16 4 ' 0 1 3/8 1 0 1 11/16 1 0 1 0 1 3/4 1 0 1
Figure imgf000020_0001
Next, the DCT rotation angle 7π/16 (14e shown in Fig. 1) may be replaced with
one of the following sets of lifting steps (or equivalent steps) depicted below in
matrix form:
"1 -1/8" 1 0" "l - 3/16" 1 0 1 1 1 0
, or
16 ' 0 1 3/16 1 _ _0 1 13/16 1 0 1 13/16 1
Finally, the DCT rotation angle 3π/16 (14f shown in Fig. 1) may be replaced with
one of the following sets of lifting steps (or equivalent steps) depicted below in
matrix form:
π t 16
Figure imgf000021_0001
Note that each embodiment of the binDCT still employs the following ±1 butterfly
(of Fig. 1) which may be represented in matrix form as:
1 1
1 - 1 and whose inverse is itself scaled down by a factor of two. To recover xfi] exactly on
the synthesis end, the outputs of the inverse binDCTs have to be attenuated by Vt,
which is preferably implemented in software and/or hardware as a right-shift of the
data two binary places. It should be noted that in lossless coding, the ±1 butterfly is
not very efficient since it amplifies the signal by 2. To compensate for this effect, in
accordance with another aspect of the invention one or more ±1 butterflies (e.g., 12
in Figs. 5-8) may be replaced with either of the following sets of lifting steps (or
equivalent steps) as depicted below in matrix form:
Figure imgf000022_0001
As an example, Figs. 10 and 11 are respectively provided to illustrate lossless
versions of the embodiment of the binDCT shown in Fig. 5 and its inverse binDCT
in which the ±1 butterflies (e.g., 12 of Fig. 5-8) are replaced with lifting steps having
dyadic denominators in their lifting coefficients. Similar modifications may be made
to each embodiment (e.g., as respectively depicted in Figs. 5-8) without departing
from the spirit or scope of the invention. Fig. 12 depicts a graphical representation of frequency response for a forward
and inverse binDCT, respectively, in accordance with a preferred embodiment of the
invention.
Fig. 13 illustrates a table of coefficients for a transform matrix which may be
used to produce the same or substantially the same coefficients produced by binDCT,
in accordance with embodiments of the invention (e.g., as in Fig. 5). The top
portion of the chart 130 depicts transform coefficients used for signal analysis. The
bottom portion of the chart 132 depicts transform coefficients used for signal
synthesis. Each portion of the table 130, 132 is obtained by first converting each
±1 butterfly and each lifting step within binDCT, e.g., the preferred embodiment of
the invention shown in Fig. 5, into matrix form. Next, multiplying each of the
matrices within a basis function (e.g., x[0]) to respectively produce results in the 8
rows representative of the 8 output coefficients produced using binDCT. That is, the
Fig. 13 table provides a matrix of transform coefficients, h0 through h7, which are M
basis functions of an M x M transform consistent with a preferred embodiment of the
invention.
Fig. 14 depicts a table showing empirical data of peak signal-to-noise ratios
(PSNR) between reconstructed and original images. The table depicts results for
three separate test images represented in 512 x 512 8-bit grayscale; "Lena,"
"Goldhill" and "Barbara." The table depicts the PSNR for the DCT in the left column of each test image and offers a comparison with two preferred embodiments
of the invention (of Figs. 5 and 7, respectively). As can be seen in Fig. 14, the
performance of the binDCT is very similar to that of the DCT.
Referring now to Fig. 15, a transmission system is depicted in which each
embodiment of the binDCT may be employed within a respective analysis block 150,
152, and each inverted binDCT may be employed within a respective synthesis block
154, 156. At the analysis side, an input signal Y(z) is processed by a pair of analysis
block 150, 152, each employing an embodiment of the binDCT block transform in
accordance with the invention. The resulting sub -band signals are then decimated by
a factor of two. At the synthesis side, the signals Y0(z), Yi(z) are interpolated, and
transformed by analysis blocks 154, 156, each of which employs an embodiment of
the inverse binDCT block transform in accordance with the invention. The
synthesized signals are then added to produce the output signal Y(z)'. As should be
readily apparent, the application of the invention is not limited to a transmission
system and may be employed for other applications including, but not limited to,
storing the coded data and reproduction of the data using any given medium.
Furthermore, although an 8 x 8 transform has been depicted, the implementation of
the invention is not limited to 8 x 8 transforms and, accordingly, may be used to
process matrices of any dimension without departing from the spirit or scope of the
invention. The invention provides a simplified block transform that can be implemented
using only bit-shift and add operations. The transform replaces DCT rotation angles
with a series of lifting steps preferably having dyadic denominators in their lifting
coefficients specifically chosen to allow the binDCT to operate with substantially the
same degree of accuracy as the DCT, but without the DCT's level of complexity.
The result is a simpler, faster, transform process which retains all of the practical
benefits of the DCT.
The novel block transform of the invention, "binDCT," offers at least the
following advantages:
• The binDCT has a fast, elegant implementation utilizing only shift-and-
add operations. No multiplication is needed. For example, eight
transform coefficients can be computed using only 14 bit shift operations
and 31 addition operations (13 floating point-multiplication operations
and 29 addition operations are required for the DCT);
• The binDCT can map integers to integers with exact reconstruction. It
also allows a unifying lossy/lossless coding framework; • In software implementation, the binDCT is at least 3 - 4 times faster than
the floating-point DCT. In hardware implementations, the binDCT may
be approximately 10 times or more faster than the floating-point DCT;
• The multiplierless property of the binDCT allows efficient very large scale
integrated circuit (VLSI) implementations in terms of both chip area and
power consumption;
• The binDCT approximates the DCT very closely. Perceptual quantization
matrices and coding strategies designed specifically for the DCT can be
applied to the binDCT immediately without any complicated
modification; and
• The cascade of a well-tuned forward binDCT followed by an inverse DCT
(or a forward DCT followed by an inverse binDCT) produces a reasonable
near-perfect-reconstruction system.
The binDCT provides very high coding performance. For instance, the
coding gain (a popular measure of a transform's energy compaction property) of
several binDCT versions may range in certain examples from 8.77 to 8.82 dB,
whereas the DCT has a coding gain of 8.83 dB. In image coding experiments, the
binDCT may be approximately 0.1 - 1.0 dB below the DCT in the signal-to-noise ratios of the reconstructed images in certain examples. The binDCT may be found
to offer higher visual reconstructed quality than the DCT.
While preferred embodiments of the invention have been described and
illustrated, it should be apparent that many modifications can be made to the
invention and the invention's application without departing from its spirit or scope.
Accordingly, the invention is not limited by the foregoing description or drawings,
but is only limited by the scope of the appended claims.

Claims

What is claimed is:
1. A method of processing digital samples, the method comprising:
processing the digital samples with a plurality of lifting steps, said act of
processing being multiplierless.
2. The method as in claim 1 further comprising:
receiving at an input of a transform a plurality of sample segments of discrete
time signals; and
generating a plurality of transform coefficients.
3. The method as in claim 2 further comprising processing the plurality of
sample segments in pairs of two, said act of processing generating a plurality of sums
and differences of said sample segments.
4. The method as in claim 1, wherein a lifting coefficient of the lifting
steps is dyadic.
5. The method as in claim 3, wherein said act of processing is achieved
with at least one ±1 butterfly.
6. The method as in claim 5, wherein said act of processing is achieved
with a lifting step which, in matrix form, can be expressed as either:
Figure imgf000029_0001
7. The method as in claim 1 further comprising:
receiving at an input of a transform a plurality of generated transform
coefficients; and
generating a plurality of sample segments of discrete time signals.
8. The method as in claim 7 further comprising processing a plurality of
sums and differences generated from said act of processing the digital samples with a
plurality of ±1 butterflies.
9. The method as in claim 8, wherein said act of processing a plurality of
sums and differences is achieved with a plurality of lifting steps having dyadic lifting
coefficients.
10. The method as in claim 2, wherein said act of receiving further
comprises receiving at an input of an M x M transform M sample segments of
discrete time signals.
11. The method as in claim 10, wherein said act of generating further
comprises generating M transform coefficients.
12. The method as in claim 7, wherein said act of receiving further
comprises receiving at an input of an inverse M x M transform M generated
transform coefficients.
13. The method as in claim 12, wherein said act of generating further
comprises generating M sample segments of discrete time signals.
14. The method as in claim 4, wherein one of the following sets of dyadic
lifting coefficients, as represented in matrix form, is employed:
"
Figure imgf000030_0001
15. The method as in claim 4, wherein one of the following sets of dyadic
lifting coefficients, as represented in matrix form, is employed:
1 0
Figure imgf000031_0003
- 5/8 1
16. The method as in claim 4, wherein one of the following sets of dyadic
lifting coefficients, as represented in matrix form, is employed:
-1 5/8 1 -1 3/8 1 0 1 - 7/16 -1 7/16 1 0 7/16 0 1 3/8
Figure imgf000031_0001
0 1 11/16 1 0 1 0 1 3/4 1 1
- 1 1/2 1 0 1 - 1/2 -1 3/8 1 0" 1 - 3/8' 1 1 or 0 1 3/4 1 0 1 0 1 3/4 1 0 1 1 -1
17. The method as in claim 4, wherein one of the following sets of dyadic
lifting coefficients, as represented in matrix form, is employed:
Figure imgf000031_0002
18. The method as in claim 4, wherein one of the following sets of dyadic
lifting coefficients, as represented in matrix form, is employed:
1 0 1 7/8 1 0 1 1/2 1 0 1 0 1 1/2 1 0 or
-1/2 1 0 1 - 1/8 1 0 1 - 3/4 1 - 1/4 1 0 1 -1/4 1
19. An analysis filter, the filter comprising:
a receiver for receiving at an input a plurality of sample segments of discrete
time signals; and
means for processing the plurality of sample segments of discrete time signals
with a plurality of lifting steps, the processing being multiplierless, each of the
plurality of lifting steps having a dyadic lifting coefficient, and the means for
processing also being capable of generating a plurality of transform coefficients.
20. A synthesis filter, the filter comprising:
a receiver for receiving at an input a plurality of generated transform
coefficients; and
means for processing the plurality of generated transform coefficients through
a plurality of lifting steps, the processing being multiplierless, each of the plurality of
lifting steps having a dyadic lifting coefficient, and said means for processing also
being capable of generating a plurality of sample segments of discrete time signals.
21. A computer readable storage medium containing a computer readable
code for operating a computer to process digital samples, said code causing said
computer to:
process the digital samples with a plurality of lifting steps, said act of
processing being multiplierless.
22. The computer readable storage medium of claim 21, wherein said code
further causes said computer to:
receive at an input of a transform a plurality of sample segments of discrete
time signals; and
generate a plurality of transform coefficients.
23. The computer readable storage medium method of claim 22, wherein
said code further causes said computer to:
process the plurality of sample segments in pairs of two, thereby generating a
plurality of sums and differences of said sample segments.
24. The computer readable storage medium of claim 21, wherein a lifting
coefficient of the lifting steps is dyadic.
25. The computer readable storage medium of claim 23, wherein said code
causes said act of processing to be achieved with at least one ±1 butterfly.
26. The computer readable storage medium of claim 25, wherein said act
of processing is achieved with a lifting step which, in matrix form, can be expressed as
either:
Figure imgf000035_0001
27. The computer readable storage medium of claim 21, wherein said code
further causes said computer to:
receive at an input of a transform a plurality of generated transform
coefficients; and
generate a plurality of sample segments of discrete time signals.
28. The computer readable storage medium of claim 27, wherein said code
further causes said computer to process a plurality of sums and differences generated
from said act of processing the digital samples with a plurality of ±1 butterflies.
29. The computer readable storage medium of claim 28, wherein said code
causes said act of processing a plurality of sums and differences to be achieved with a
plurality of lifting steps having dyadic lifting coefficients.
30. The computer readable storage medium of claim 22, wherein said code
further causes said computer to receive at an input of an M x M transform M sample
segments of discrete time signals.
31. The computer readable storage medium of claim 30, wherein said code
further causes said computer to generate M transform coefficients.
32. The computer readable storage medium of claim 27, wherein said code
further causes said computer to receive at an input of an inverse M x M transform M
generated transform coefficients.
33. The computer readable storage medium of claim 32, wherein said code
further causes said computer to generate M sample segments of discrete time signals.
34. The computer readable storage medium of claim 24, wherein said code causes
one of the following lifting coefficients, as represented in matrix form, to be
employed:
"l - 1/2" "l o" "1 0" "1 l"
Lu lj |_1 -lj |_l/2 -lj [θ lj
Figure imgf000037_0001
35. The computer readable storage medium of claim 24, wherein said code
causes one of the following lifting coefficients, as represented in matrix form, to be
employed:
1 0 3/8 1 0 1 1 1 0 or 3/8 1 1 - 5/8 1 0 1 - 5/8 1
36. The computer readable storage medium of claim 24, wherein said code
causes one of the following lifting coefficients, as represented in matrix form, to be
employed:
"- ] I 5/8" "1 0" - 1 3/8] "1 0" "1 - 7/16" "- 1 7/16" "1 0] - 7/16
0 1 3/8 1_ 0 lj 11/16 1_ .0 lj _0 1_ 3/4 lj 1
Figure imgf000037_0002
37. The computer readable storage medium of claim 24, wherein said code
causes one of the following lifting coefficients, as represented in matrix form, to be
employed:
Figure imgf000038_0001
38. The computer readable storage medium of claim 24, wherein said code
causes one of the following lifting coefficients, as represented in matrix form, to be
employed:
1 0 1 7/8 1 0 1 1/2 1 0 1 0 1 1/2 1 0 or
-1/2 1 0 1 - 1/8 1 0 1 - 3/4 1 -1/4 1 0 1 -1/4 1
PCT/US2000/006941 1999-03-17 2000-03-17 A fast multiplierless transform WO2000055757A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU37519/00A AU3751900A (en) 1999-03-17 2000-03-17 A fast multiplierless transform
EP00916411A EP1311975A1 (en) 1999-03-17 2000-03-17 A fast multiplierless transformation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12474699P 1999-03-17 1999-03-17
US60/124,746 1999-03-17

Publications (1)

Publication Number Publication Date
WO2000055757A1 true WO2000055757A1 (en) 2000-09-21

Family

ID=22416628

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/006941 WO2000055757A1 (en) 1999-03-17 2000-03-17 A fast multiplierless transform

Country Status (3)

Country Link
EP (1) EP1311975A1 (en)
AU (1) AU3751900A (en)
WO (1) WO2000055757A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1352336A1 (en) * 2000-10-23 2003-10-15 International Business Machines Corporation Faster transforms using scaled terms, early aborts, and precision refinements
WO2005031596A1 (en) * 2003-09-29 2005-04-07 Agency For Science, Technology And Research Method for transforming a digital signal from the time domain into the frequency domain and vice versa
DE102021212454A1 (en) 2021-11-04 2023-05-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung eingetragener Verein METHODS OF TRAINING AND/OR EXECUTING A CONVOLUTION-BASED NEURAL NETWORK USING AN APPROXIMATED FOURIER FAST TRANSFORMATION
DE102021212453A1 (en) 2021-11-04 2023-05-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung eingetragener Verein METHOD FOR THE APPROXIMATIVE CALCULATION OF BUTTERFLY UNITS OF A FOURIER TRANSFORMATION OF A SIGNAL

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8595281B2 (en) 2006-01-11 2013-11-26 Qualcomm Incorporated Transforms with common factors
US8849884B2 (en) 2006-03-29 2014-09-30 Qualcom Incorporate Transform design with scaled and non-scaled interfaces

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5249146A (en) * 1991-03-27 1993-09-28 Mitsubishi Denki Kabushiki Kaisha Dct/idct processor and data processing method
US5285402A (en) * 1991-11-22 1994-02-08 Intel Corporation Multiplyless discrete cosine transform
US5642438A (en) * 1991-12-20 1997-06-24 Alaris, Inc. Method for image compression implementing fast two-dimensional discrete cosine transform
US5999958A (en) * 1996-04-24 1999-12-07 National Science Council Device for computing discrete cosine transform and inverse discrete cosine transform
US6006246A (en) * 1992-11-13 1999-12-21 Sony Corporation Method of and apparatus for multiplying matrix data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5249146A (en) * 1991-03-27 1993-09-28 Mitsubishi Denki Kabushiki Kaisha Dct/idct processor and data processing method
US5285402A (en) * 1991-11-22 1994-02-08 Intel Corporation Multiplyless discrete cosine transform
US5642438A (en) * 1991-12-20 1997-06-24 Alaris, Inc. Method for image compression implementing fast two-dimensional discrete cosine transform
US6006246A (en) * 1992-11-13 1999-12-21 Sony Corporation Method of and apparatus for multiplying matrix data
US5999958A (en) * 1996-04-24 1999-12-07 National Science Council Device for computing discrete cosine transform and inverse discrete cosine transform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1311975A4 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1352336A1 (en) * 2000-10-23 2003-10-15 International Business Machines Corporation Faster transforms using scaled terms, early aborts, and precision refinements
EP1352336A4 (en) * 2000-10-23 2006-09-06 Ibm Faster transforms using scaled terms, early aborts, and precision refinements
WO2005031596A1 (en) * 2003-09-29 2005-04-07 Agency For Science, Technology And Research Method for transforming a digital signal from the time domain into the frequency domain and vice versa
WO2005031597A1 (en) * 2003-09-29 2005-04-07 Agency For Science, Technology And Research Process and device for determining a transforming element for a given transformation function, method and device for transforming a digital signal from the time domain into the frequency domain and vice versa and computer readable medium
WO2005031595A1 (en) * 2003-09-29 2005-04-07 Agency For Science, Technology And Research Method for performing a domain transformation of a digital signal from the time domain into the frequency domain and vice versa
KR100885438B1 (en) 2003-09-29 2009-02-24 에이전시 포 사이언스, 테크놀로지 앤드 리서치 Method for performing a domain transformation of a digital signal from the time domain into the frequency domain and vice versa
KR100885437B1 (en) * 2003-09-29 2009-02-24 에이전시 포 사이언스, 테크놀로지 앤드 리서치 Method for transforming a digital signal from the time domain into the frequency domain and vice versa
US8126951B2 (en) 2003-09-29 2012-02-28 Agency For Science, Technology And Research Method for transforming a digital signal from the time domain into the frequency domain and vice versa
US8126950B2 (en) 2003-09-29 2012-02-28 Agency For Science, Technology And Research Method for performing a domain transformation of a digital signal from the time domain into the frequency domain and vice versa
DE102021212454A1 (en) 2021-11-04 2023-05-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung eingetragener Verein METHODS OF TRAINING AND/OR EXECUTING A CONVOLUTION-BASED NEURAL NETWORK USING AN APPROXIMATED FOURIER FAST TRANSFORMATION
DE102021212453A1 (en) 2021-11-04 2023-05-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung eingetragener Verein METHOD FOR THE APPROXIMATIVE CALCULATION OF BUTTERFLY UNITS OF A FOURIER TRANSFORMATION OF A SIGNAL

Also Published As

Publication number Publication date
EP1311975A4 (en) 2003-05-21
EP1311975A1 (en) 2003-05-21
AU3751900A (en) 2000-10-04

Similar Documents

Publication Publication Date Title
US5859788A (en) Modulated lapped transform method
US7471850B2 (en) Reversible transform for lossy and lossless 2-D data compression
Tran The BinDCT: Fast multiplierless approximation of the DCT
CN101796506B (en) Transform design with scaled and non-scaled interfaces
US6219458B1 (en) Overlapped reversible transforms for unified lossless/lossy compression
US20020006229A1 (en) System and method for image compression and decompression
KR100331136B1 (en) A computer system performing an inverse cosine transfer function for use with multimedia information
US7613761B2 (en) Haar wavelet transform embedded lossless type II discrete cosine transform
US6876704B2 (en) Apparatus and method for encoding and computing a discrete cosine transform using a butterfly processor
US7634525B2 (en) Haar wavelet transform embedded lossless type IV discrete cosine transform
US7720299B2 (en) Compressed data multiple description transmission and resolution conversion system
US6317767B2 (en) Methods and systems for performing short integer chen IDCT algorithm with fused multiply/add
US7649939B2 (en) Apparatus and method for decoding and computing a discrete cosine transform using a butterfly processor
US6259820B1 (en) Progressive JPEG decoding
US7805476B2 (en) Extended Haar transform
EP1311975A1 (en) A fast multiplierless transformation
Shawahna et al. JPEG image compression using the discrete cosine transform: an overview, applications, and hardware implementation
Wahid et al. Error-free computation of 8/spl times/8 2D DCT and IDCT using two-dimensional algebraic integer quantization
Walters III et al. Using truncated multipliers in DCT and IDCT hardware accelerators
Weeks Precision for 2-D discrete wavelet transform processors
Bracamonte et al. A multiplierless implementation scheme for the JPEG image coding algorithm
JP3225614B2 (en) Data compression device
Dang et al. Discrete wavelet transform for image compression: a hardware approach
Wahid Error-free Algorighms and Architectures of Discrete Cosine Transforms Using Multidimensional Algebraic Interger Quantization
Huluta Discrete wavelet transform architecture for image coding and decoding

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2000916411

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 2000916411

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2000916411

Country of ref document: EP