|Publication number||WO1998053613 A1|
|Publication date||26 Nov 1998|
|Filing date||21 Apr 1998|
|Priority date||20 May 1997|
|Publication number||PCT/1998/8193, PCT/US/1998/008193, PCT/US/1998/08193, PCT/US/98/008193, PCT/US/98/08193, PCT/US1998/008193, PCT/US1998/08193, PCT/US1998008193, PCT/US199808193, PCT/US98/008193, PCT/US98/08193, PCT/US98008193, PCT/US9808193, WO 1998/053613 A1, WO 1998053613 A1, WO 1998053613A1, WO 9853613 A1, WO 9853613A1, WO-A1-1998053613, WO-A1-9853613, WO1998/053613A1, WO1998053613 A1, WO1998053613A1, WO9853613 A1, WO9853613A1|
|Inventors||Marshall A. Robers, Mark R. Banham, Aggelos K. Katsaggelos|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (7), Referenced by (10), Classifications (37), Legal Events (5)|
|External Links: Patentscope, Espacenet|
APPARATUS, METHOD AND COMPUTER READABLE MEDIUM FOR SCALABLE CODING OF VIDEO INFORMATION
Field of the Invention
This invention relates to video compression and coding
techniques, and more specifically, to an apparatus, method and computer readable medium for scalable coding of video information.
Background of the Invention
Many applications requiring the transmission and/or storage of digital video information are limited by the available bandwidth of the system. A variety of applications such as surveillance, public safety, and video database browsing can thus benefit from the ability to transmit or decode a low resolution rendition of a high quality video scene. This low
resolution rendition, however, is not always sufficient to meet the needs of end users. Often a high quality video sequence is needed to gain more information from the source. The ability to create both a low
resolution video sequence, and higher resolution sequence from a single bitstream can be very useful for the applications mentioned.
Rendering multiple levels of quality from a single bitstream addresses
the needs of limited encoding complexity and reduced overall disk storage space, and permits novel functionalities such as streaming video at different levels of quality depending on available network bandwidth.
Currently, there does not exist a very efficient coding method for digital video data with multiple qualities extractable from a single encoded bitstream, which can leverage the technology in existing standardized video codecs. An apparatus, method, and computer readable medium designed to efficiently perform scalability utilizing the platform of existing standardized video codecs would solve many problems for applications needing scalable video.
Brief Description of the Drawings
FIG. 1 is a flow chart illustrating one preferred embodiment of steps of a method in accordance with the present invention.
FIG. 2 is a diagram illustrating spectral scan parameters and quantization scan parameters of one preferred embodiment of a method in accordance with the present invention.
FIG. 3 is a block diagram of one preferred embodiment of an
apparatus for scalable coding of a plurality of video frames in accordance with the present invention. FIG. 4 is a diagrammatic representation of one preferred embodiment of a computer readable medium for scalable coding of video information in accordance with the present invention.
FIG. 5 is another preferred embodiment of a flow chart for a method for scalable coding of video information, the video information having a plurality of video frames, in accordance with the present invention.
Detailed Description of a Preferred Embodiment
This invention involves scalable encoding and decoding of 8 x 8 blocks of discrete cosine transform (DCT) coefficients for both INTRA and INTER coded blocks. INTRA coded blocks are those blocks of video data which do not utilize any temporal prediction from prior frames in the video sequence. INTER coded blocks have a prediction from a prior
frame, and a prediction error which is coded with the DCT. This method
can be applied within the structure of the ITU-T H.263 standard for video
coding at low bitrates. The present invention uses a type of scalability
known as SNR (signal-to-noise-ratio) scalability (to differentiate it from spatial and temporal scalabilities which involve changes in spatial and temporal resolution). The novelty of the present invention is found at the block level of the H.263 syntax, where it defines multiple scans, or layers, of refinement for the DCT coefficients of the displaced frame difference (DFD) INTER block, or INTRA block being coded. This scalable method allows flexibility in defining the scans, and both the number of scans and the content of each scan can be varied.
Video coding at low bitrates requires a compression technique which utilizes the temporal redundancy of a video sequence (i.e., the strong correlation of consecutive frames). Most video coding schemes include a block matching technique for motion estimation and compensation. The task of block matching becomes more difficult within the context of a scalable video coder because motion compensation requires the use of the previous reconstructed frame. An encoder using this methodology explicitly has a decoder in its coding loop. A decoder may or may not decode all layers of quality of a scalably encoded previous reconstructed frame. It is, thus, necessary to guarantee that the previous reconstructed frame used for prediction in the encoder is the
same for all possible subsets of the overall compressed stream. For this reason, motion compensation within the encoder (i.e., determination of
the DFD) of the present invention is based on the previous reconstructed
frame found in the minimum subset of the compressed scalable
bitstream. This minimum subset is called the base-layer, and it is determined by the expected minimum bandwidth channel for a specific application. Using the base-layer for the encoder's motion compensation guarantees that the motion compensation process can be exactly
duplicated in the decoder.
FIG. 1 , numeral 100, is an overall block diagram of a preferred embodiment of a method for scalable encoding. The encoding process includes a determination of a target number of bits to spend on a macroblock which will be scalably encoded (102). The parameters specifying how the data in that block shall be partitioned are computed in step (104). These parameters include a spectral scan parameter and a quantization scan parameter for each scan. Multiple scans of coefficients are generated in step (106), and encoded using variable length codes in step (108). Finally, the lowest resolution scan, or base-layer, is extracted in the encoder for use in prediction of the next frame (110).
This invention defines a partitioning approach for DCT coefficients of video frames. Still image compression using the "progressive" mode
of the JPEG standard is related to this partitioning approach. In JPEG,
blocks of "still images" are compressed by breaking up the DCT data
into predetermined groups of coefficients. In this invention, however, the
partitioning approach is applied adaptively to DCT coefficients represented by the block layer of the syntax of a video bitstream. The partitioning approach involves specifying a set of scans, which are subsets of the set of DCT coefficients associated with a block of video data. These scans are then encoded separately, permitting a decoder to extract one, some, or all of the scans associated with the DCT data to produce video of varying qualities. The application and design of this method for video compression requires significant departure from the application of scalable DCT coding to still images. The methods for defining the DCT coefficient scans in this invention are given next, and can be seen graphically in FIG. 2, numeral 200.
Spectral scan selection involves transmitting a subset of an 8 x 8 block of DCT coefficients in a particular scan. In spectral scan selection, some of the 64 DCT coefficients are sent in their entirety (i.e., all bits of magnitude precision), and no information is sent about the other DCT coefficients. The DCT tends to decorrelate a block of values so that the majority of the data required for perceptually lossless compression is contained in the low frequency coefficients. Therefore, appropriate use of spectral scan selection for video involves transmitting low frequency DCT
coefficients in the first scans and higher frequency DCT coefficients in
subsequent scans. A graphical representation of a typical scan definition for a single 8 x 8 block of DCT coefficients using spectral scan selection
can be found in FIG. 2, numeral 202. In this figure, the 64 coefficients are ordered from top to bottom, and the significant bits of each coefficient (Most Significant Bit (MSB) to Least Significant Bit (LSB)) are ordered from left to right.
A second method for partitioning a block of DCT coefficients is bit plane coding. In this scheme, the coefficients are refined in precision
(i.e., magnitude) in the various scans. Thus, a base-layer constructed using bit plane coding would contain the most significant bits for all 64
DCT coefficients. Subsequent scans, which contain less significant bits than the base-layer, would then refine the magnitudes of the DCT coefficients. The enhancement scans only contain useful information if accompanied by all previous scans; i.e., the LSB contains useful information only if all other bits are known. The adjustment of the precision of these coefficients is equivalent to varying the quantization of each coefficient. The bit plane coding of coefficients is controlled by a scan quantization parameter. A graphical representation of a typical scan
definition for a single 8 x 8 block of DCT coefficients using bit plane coding is seen in FIG. 2, numeral 204.
A third and final approach for the present scan definition involves
combining spectral scan selection and bit plane coding. This scheme
offers the user increased control over exactly which coefficient
information is contained in each scan. With this hybrid of both approaches, one can define the base-layer as the most significant bits of the lower frequency DCT coefficients. Subsequent scans would refine those coefficients included in the base-layer and begin to include the coefficients for higher frequency coefficients. The final scan would transmit the least significant bits of the high frequency coefficients. A graphical representation of a typical scan definition for a single 8 x 8 block of DCT coefficients using the combined mode of both spectral
scan selection and bit plane coding can be found in FIG. 2, numeral 206.
The flexibility incorporated into the scan definition permits the use of efficient VLCs. Within the H.263 standard, for example, each significant (i.e., nonzero) DCT coefficient is coded using a 3-D VLC determined by the relative frequency of occurrence of each symbol. Each 3-D code corresponds to a specific combination of three different parameters: (1) the run: number of preceding non-significant coefficients, (2) the level: the quantized index corresponding to the value of the significant
coefficient, and (3) a binary value called 'last' which tells if the current coefficient is the last significant coefficient in the block. This invention uses this 3D VLC coding method within the context of scalable video
coding. In order to improve the compression efficiency, scan-dependent VLC tables may be used. More specifically, the relative frequency of each
symbol in the 3-D VLC is dependent on the scan definition. Scan-
dependent VLC tables take advantage of the dependency between each symbol's rate of occurrence and the scan used. The importance of scan- dependent VLC tables can be understood by considering a scan which contains only the LSB for a group of DCT coefficients. For this scan, the allowed values for the level can be reduced to a binary value instead of a range of values, thus improving the efficiency of that code.
When designing a video transmission scheme for real-time communication channels, practical limits are set on the allowable bandwidth of the encoded video subsets. Thus, the partitioning of the DFD and INTRA block data using both spectral scan selection and bit plane coding must be adaptive so the bitrate constraints can be met. This invention provides a method for defining the scan parameters in order to obtain the desired bitrates, given a predetermined rate control system to adjust the overall DCT quantization stepsize and the coded framerate.
The overall DCT quantization stepsize and the coded framerate are adjusted based on the desired bitrate for all scans combined. The approach for selecting and modifying both the overall DCT quantization
stepsize and the coded framerate can be any standard procedure based on buffer management. The adjustments to the frame rate, and the
quantization step sizes assume the existence of a channel which can
transmit at a constant rate. In other words, the input buffer is assumed to
empty at a constant rate. The coded framerate is regulated by a procedure which is executed every time that a frame is coded. This type of rate control is a common part of most existing motion compensated block-DCT based video codecs.
In order to partition a block of DCT coefficients after selection of the coded frame and quantization of those coefficients, this invention
divides the total incoming bits into subsets of specified sizes. The basic idea of the method is to change the boundaries of the scans based on the target bitrates for each of the scans. This method uses maximum predetermined bitrates for each scan. The modification of the scan parameters can be executed at any macroblock boundary, or any time the overall DCT quantization stepsize can be adjusted within the syntax of the video bitstream.
In order to dynamically modify the scan parameters, they must first be explicitly specified. The dynamic approach of this invention
parameterizes the boundaries between each scan. This method can be used for any number of scans; here, an example is provided based on a
video sequence with three scans per block of DCT coefficients (see Table 1). Note that Scan 3 contains the uncoded LSBs from all DCT
coefficients. This division into three subsets yields three parameters
(A,B, and X) which the method dynamically adjusts.
Table 1 : Example Parameterized Coefficient Scan Definitions
This partitioning scheme changes the scan parameters based on the number of bits spent on each scan during the previous frame. In other words, buffers are maintained for each scan which hold the bits used for representing the previous frame. As each macroblock line in the new frame is coded, bits are added to the appropriate buffers and the bits spent on that macroblock line in the previous frame are removed. The number of bits in these scan buffers at the end of each macroblock line can be used to calculate the error from the target bits for each scan. This is defined as Target Bit Error (TBE):
TBE(j) = Bits_ln_Buffer(j) - Target_Bits_Per_FrameG),
where the argument j is used to indicate the current scan number. The
target number of bits per frame depends on the coded framerate, and is set by the predetermined rate control common to existing motion compensated block-DCT based video codecs.
Each TBE is normalized based on the assumption that exceeding the target bitrate by a fixed number of bits requires more significant and immediate correction for a scan with a smaller target bitrate. This normalization produces a Normalized Target Bit Error (NTBE) for each scan. Here,
Finally, the TBE's are compared to determine if the scan parameters need to be adjusted. This is done by calculating three scan differences
(Δ(i,j)) by comparing the NTBE's for each scan. The definition of the
scan differences for the example case with 3 scans is:
Δ(1,2) = NTBE(1) - NTBE(2);
Δ(1,3) = NTBE(1) - NTBE(3);
Δ(2,3) = NTBE(2) - NTBE(3). These Δ(i,j) values are compared to predetermined thresholds
(T(i,j)) which depend on the maximum allowable deviation from the desired scan bitrates. If the threshold is exceeded, the appropriate scan parameter is adjusted, (see Table 2). These scan adjustments must result in a feasible solution for bitstream encoding, and one preferred embodiment is described next. The amount by which A,B, and X are incremented/decremented is chosen to be proportional to the integer
division of Δ(i,j) by T(i,j) by a predetermined proportionality constant. The
magnitude of the scan adjustments is also limited. These limitations prevent the scan parameters from oscillating rapidly and do not pose difficulty for meeting imposed bitrate constraints.
Table 2: Dynamic Adjustment of Scan Parameters The decoder must know of any adjustments to the scan parameters. One preferred embodiment of the coding of the scan
parameters is to encode changes in these parameters only within the bit
field of a Group of Blocks (GOB) header, which is part of the syntax of H.263 within which this preferred embodiment is implemented. The number of bits required for these parameters is minimal since the magnitude of the scan adjustments is been limited. The values of the thresholds, T(i,j), seen in Table 2, is set to 0.15 for all cases. A, B, and X
are changed proportionally to the amount that Δ(i,j) exceeds T(i,j) for each
The scan bit precision parameters, referred to here as the quantization scan parameters, A and B, are limited to take on the values: 0,1 , and 2, and each is permitted to change only by -1 , 0, or +1 at each valid change point. A field of 2 bits is needed to transmit the absolute value of each of these parameters at each GOB header. The spectral scan parameter, X, is permitted to take on the values: -7, -6, -5, -4, -3, -2, -1 , 0, 1 , 2, 3, 4, 5, 6, 7, and is limited to lie within the range [5,35]. A field of 5 bits is coded at each GOB header to transmit the absolute value of the spectral scan parameter. The scan parameters are limited in terms of possible values in order to prevent rapid changes in bitrate within a video frame, and too reduce the number of bits needing to be transmitted
in each encoded frame. A decoder can read the values of the scan
parameters at each GOB header, and adjust the scan definitions before decoding the plurality of scans associated with each block of DCT coefficients. The scan parameters, along with the motion vectors and all administrative information, are transmitted with the base layer.
FIG. 3, numeral 300, is a block diagram of one preferred embodiment of an apparatus for scalable coding of a plurality of video frames. The apparatus comprises a memory unit (302), and a scalable partitioning video processor/ASIC (application specific integrated circuit) (304) coupled to the memory. The scalable partitioning video processor/ASIC (304) initiates a program by sending a control signal (306) to the memory unit (302). The a scalable partitioning video processor/ASIC (304) is responsive to a set of program instructions stored in the memory unit (302), which, when operably coupled to the memory unit (302), determines a plurality of scan parameters (312) for a corresponding plurality of bit rates. The scalable partitioning video processor/ASIC (304) is used to transform a video frame of the plurality of video frames into blocks, typically 8x8, of DCT coefficients (308). The scalable partitioning video processor/ASIC (304) is further responsive to partition the DCT coefficients of each block into a plurality of scans (310),
each scan of the plurality of scans having a spectral scan parameter and a quantization scan parameter of the plurality of scan parameters; and
the scalable partitioning video processor/ASIC is further responsive to
encode each scan of the plurality of scans using predetermined variable
length codewords (314) and outputting coded scan coefficients (318), and, where selected, to further change the scan parameters at predetermined locations in a video frame according to a predetermined rate control scheme (316) in order to effectively reach a target coded bitrate associated with each scan.
FIG. 4, numeral 400, is a diagram of one preferred embodiment of executable instructions and output parameters of a computer readable medium for scalable coding of a plurality of video frames. The computer readable medium (401) stores the plurality of executable instructions (402), the plurality of executable program instructions responsive, when executed, to determine a plurality of scan parameters (404) for a corresponding plurality of bit rates. The executable program instructions also transform a video frame of the plurality of video frames into blocks, typically 8x8, of DCT coefficients (406). The executable program instructions partition the DCT coefficients into a plurality of scans, each scan of the plurality of scans having a spectral scan parameter (408) and a quantization scan parameter (410) of the plurality of scan parameters, and encode each scan of the plurality of scans by selecting
predetermined variable length codewords (412) executable instructions which are typically stored in the medium. The plurality of executable
instructions signal a change (414) in the spectral scan parameter and
the quantization scan parameter of each of the plurality of scan parameters at predetermined locations in a video frame in order to effectively reach a target coded bitrate associated with each scan.
FIG. 5, numeral 500, is another preferred embodiment of a flow chart for a method for scalable coding of video information, the video information having a plurality of video frames, in accordance with the
present invention. The method includes: (a) determining a plurality of scan parameters for a corresponding plurality of bit rates (502); (b) transforming a video frame of the plurality of video frames into transform information (504); (c) partitioning the transform information into a plurality of scans, each scan of the plurality of scans having a spectral scan parameter and a quantization scan parameter of the plurality of scan parameters (506); and (d) encoding each scan of the plurality of scans
(508). Typically, the transform information is a discrete cosine transform value. In one embodiment, encoding step (d) utilizes a plurality of variable length codes.
Where selected, each spectral scan parameter and each
quantization scan parameter of the plurality of scan parameters is altered according to a predetermined adjustment scheme at a plurality of
predetermined points in a video frame of the plurality of video to achieve
each bit rate of the plurality of bitrates (510). The plurality of scans generally includes a first scan having a first spectral scan parameter and a first quantization scan parameter of the plurality of scan parameters, the first spectral scan parameter and the first quantization scan parameter corresponding to a lowest bit rate of the plurality of bit rates. In one embodiment, the first scan of the plurality of scans is used as a basis for motion compensation (512).
From the foregoing, it will be observed that numerous variations and modifications may be effected without departing from the spirit and scope of the novel concept of the invention. It is to be understood that no limitation with respect to the specific methods and apparatus illustrated herein is intended or should be inferred. It is, of course, intended to cover by the appended claims all such modifications as fall within the scope of the claims.
What is claimed is:
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US4821119 *||4 May 1988||11 Apr 1989||Bell Communications Research, Inc.||Method and apparatus for low bit-rate interframe video coding|
|US5014134 *||11 Sep 1989||7 May 1991||Aware, Inc.||Image compression method and apparatus|
|US5063608 *||3 Nov 1989||5 Nov 1991||Datacube Inc.||Adaptive zonal coder|
|US5107345 *||28 May 1991||21 Apr 1992||Qualcomm Incorporated||Adaptive block size image compression method and system|
|US5109451 *||28 Jun 1991||28 Apr 1992||Sharp Kabushiki Kaisha||Orthogonal transform coding system for image data|
|US5196933 *||19 Mar 1991||23 Mar 1993||Etat Francais, Ministere Des Ptt||Encoding and transmission method with at least two levels of quality of digital pictures belonging to a sequence of pictures, and corresponding devices|
|US5321776 *||26 Feb 1992||14 Jun 1994||General Electric Company||Data compression system including successive approximation quantizer|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|WO2001003442A1 *||4 Jul 2000||11 Jan 2001||Koninklijke Philips Electronics N.V.||System and method for scalable video coding|
|WO2001047274A1 *||5 Dec 2000||28 Jun 2001||Koninklijke Philips Electronics N.V.||Fine granular scalable video with embedded dct coding of the enh ancement layer|
|WO2001062009A1 *||14 Feb 2001||23 Aug 2001||Siemens Aktiengesellschaft||Method and device for coding or coding and decoding a sequence of numbers|
|WO2001086958A1 *||3 May 2001||15 Nov 2001||Siemens Aktiengesellschaft||Method and an arrangement for the coding or decoding of a series of images|
|WO2002025925A2 *||24 Sep 2001||28 Mar 2002||Koninklijke Philips Electronics Nv||Hybrid temporal-snr fine granular scalability video coding|
|WO2002025925A3 *||24 Sep 2001||6 Sep 2002||Koninkl Philips Electronics Nv||Hybrid temporal-snr fine granular scalability video coding|
|WO2006136885A1 *||13 Apr 2006||28 Dec 2006||Nokia Corporation||Fine granularity scalability (fgs) coding efficiency enhancements|
|US6826232||20 Dec 1999||30 Nov 2004||Koninklijke Philips Electronics N.V.||Fine granular scalable video with embedded DCT coding of the enhancement layer|
|US7245663||21 Jun 2001||17 Jul 2007||Koninklijke Philips Electronis N.V.||Method and apparatus for improved efficiency in transmission of fine granular scalable selective enhanced images|
|US7245773||3 May 2001||17 Jul 2007||Siemens Aktiengesellschaft||Method and system for coding or decoding of a series of images|
|International Classification||H04N7/50, H04N7/26, H04N7/30|
|Cooperative Classification||H04N19/129, H04N19/91, H04N19/13, H04N19/146, H04N19/152, H04N19/124, H04N19/60, H04N19/137, H04N19/149, H04N19/115, H04N19/61, H04N19/30, H04N19/176, H04N19/18, H04N19/126|
|European Classification||H04N7/50, H04N7/30, H04N7/30E5, H04N7/26E2, H04N7/30E2, H04N7/30E4, H04N7/26A4S, H04N7/50E2, H04N7/50E4, H04N7/50E5, H04N7/26A4V, H04N7/30H, H04N7/26A4Q2, H04N7/26A8B, H04N7/26A6E4E, H04N7/26A6C4, H04N7/26A4E, H04N7/26A8C, H04N7/26A6E6|
|26 Nov 1998||AK||Designated states|
Kind code of ref document: A1
Designated state(s): CA
|26 Nov 1998||AL||Designated countries for regional patents|
Kind code of ref document: A1
Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE
|14 Apr 1999||121||Ep: the epo has been informed by wipo that ep was designated in this application|
|20 Jan 2000||NENP||Non-entry into the national phase in:|
Ref country code: CA
|2 Feb 2000||122||Ep: pct application non-entry in european phase|