WO2000046998A1

WO2000046998A1 - Method and arrangement for transforming an image area

Info

Publication number: WO2000046998A1
Application number: PCT/DE2000/000278
Authority: WO
Inventors: Andre Kaup
Original assignee: Siemens Aktiengesellschaft
Priority date: 1999-02-01
Filing date: 2000-02-01
Publication date: 2000-08-10
Also published as: KR20010101916A; EP1157557A1; CN1339225A; DE19903859A1; JP2002536926A

Abstract

The invention relates to a method for transforming an image area. A deciding unit first carries out a vertical transformation of the image area and then carries out a horizontal transformation of the image area or vice versa.

Description

description

Method and arrangement for transforming an image area

The invention relates to a method and an arrangement for transforming an image area

Such a method with an associated arrangement is known from [1]. The known method serves as a coding method in the MPEG standard and is essentially based on the hybrid DCT (Discrete Cosine Transformation) with motion compensation. A similar procedure is used for video telephony with nx 64kbit / s (CCITT recommendation H.261), for TV contribution (CCR recommendation 723) with 34 or 45Mbit / s and for multimedia applications with 1.2Mbit / s s (ISO-MPEG-1) is used. The hybrid DCT consists of a temporal processing stage, which takes advantage of the relationship between successive images, and a local processing stage, which uses correlation within an image.

The local processing (intraframe coding) essentially corresponds to the classic DCT coding. The image is broken down into blocks of 8x8 pixels, each of which is transformed into the frequency domain using DCT. The result is a matrix of 8x8 coefficients, which approximately reflect the two-dimensional spatial frequencies in the transformed image block. A coefficient with frequency 0 (DC component) represents an average gray value of the image block.

After the transformation, data expansion takes place. However, a concentration of energy around the DC component will take place in natural images, while the highest-frequency coefficients are usually zero. In a next step, the coefficients are spectrally weighted, so that the amplitude accuracy of the high-frequency coefficients is reduced. Here one takes advantage of the properties of the human eye, which resolves high spatial frequencies less accurately than low ones.

A second step of data reduction takes the form of an adaptive quantization, by means of which the amplitude accuracy of the coefficients is further reduced or by which the small amplitudes are set to zero. The measure of

Quantization depends on the fill level of the output buffer: If the buffer is empty, fine quantization takes place, so that more data is generated, while if the buffer is full, it is coarser, which reduces the amount of data.

After quantization, the block is scanned diagonally ("zigzag" scanning), followed by entropy coding, which causes the actual data reduction. Two effects are used for this:

1.) The statistics of the amplitude values (high amplitude values occur less frequently than small ones, so that long code words are assigned to the rare events and short code words to the frequent events (variable-length coding, VLC). This results in a lower data rate on average than with a fixed word length coding. The variable rate of the VLC is then smoothed in the buffer memory.

2.) One takes advantage of the fact that in most cases only zeros follow from a certain value. Instead of all these zeros, only an EOB code (End Of Block) is transmitted, which leads to a significant coding gain in the compression of the image data. Instead of the output rate of 512 bits, only β bits are available for this block in the example given transmitted, which corresponds to a compression factor of over 11.

Another gain in compression is obtained from the temporal processing (interframe coding). To encode differential images, less data rate is required than for the original images, because the amplitude values are much lower.

However, the time differences are only small, even if the movements in the picture are small. If, on the other hand, the movements in the picture are large, large differences arise, which in turn are difficult to code. For this reason, the picture-to-picture movement is measured (motion estimation) and compensated before the difference is formed (motion compensation).

The motion information is transmitted with the image information, usually only one motion vector per macro block (e.g. four 8x8 image blocks) is used.

Even smaller amplitude values of the difference images are obtained if a motion-compensated bidirectional prediction is used instead of the prediction used.

In the case of a motion-compensated hybrid, it is not the image signal itself that is transformed, but the temporal one

Differential signal. For this reason, the coder also has a temporal recursion loop, because the predictor must calculate the prediction value from the values of the (coded) images already transmitted. An identical time recursion loop is in the decoder, so that the encoder and decoder are completely synchronized.

In the MPEG-2 encoding process there are three main methods with which images can be processed: I-pictures: No temporal prediction is used for the I-pictures, ie the picture values are transformed and encoded directly, as shown in picture 1. I-pictures are used to complete the decoding process without knowledge of the time

To be able to start over with the past, or to achieve resynchronization in the event of transmission errors.

P-pictures: A temporal prediction is made on the basis of the P-pictures, the DCT is applied to the temporal prediction error.

B-pictures: With the B-pictures the temporal bidirectional prediction error is calculated and then transformed. The bidirectional prediction works basically adaptively, i.e. forward prediction, backward prediction or interpolation is permitted.

In MPEG-2 coding, a picture sequence is divided into so-called GOPs (Group Of Pictures), n pictures between two I-pictures form a GOP. The distance between the P-pictures is denoted by m, where there are m-1 B-pictures between the P-pictures. However, the MPEG syntax leaves it up to the user how m and n are chosen. m = 1 means that no B-pictures are used, and n = 1 means that only I-pictures are encoded.

In the context of the DCT transformation, the encoder or column-by-line transformation is preferred ... The type of transformation is the same for all image data, which is disadvantageous for certain image data.

The object of the invention is to transform an image area, the order of vertical and horizontal transformation depends on predetermined conditions that are specifically taken into account.

A clear improvement in the image quality can be achieved.

This object is achieved in accordance with the features of the independent claims. Further developments of the invention also result from the dependent claims.

To solve the problem, a method for transforming an image area is specified in which a decision unit first carries out a vertical transformation of the image area and then a horizontal transformation of the image area or vice versa, first the horizontal transformation and then the vertical transformation.

A further development consists in the fact that the image area has an irregular structure.

It is particularly advantageous that, depending on a predetermined or a determined value in the decision unit or on the decision unit, the order of the transformations can be determined. Depending on the image area to be transformed and special features that characterize it, the order of horizontal and vertical transformation can be predetermined by the decision unit in such a way that the best possible result is achieved with regard to the compression of the image area.

In the case of an irregular structure of the image area in particular, the order of the transformations is crucial, since after each vertical or horizontal transformation the pixels of the irregular image area are re-sorted and thereby one Correlation of the pixels in the local area can be lost. Such a rearrangement can in particular be an alignment along a horizontal or a vertical axis (line).

The decision unit preferably determines the sequence of the transformations on the basis of special features or a special feature of the image area, its type of transmission or a feature characteristic of it.

One embodiment consists in that the image area is aligned along a horizontal line or that the alignment takes place along a vertical line. Pixels of the rows of the image area are aligned on the vertical line or pixels of the columns of the image area are aligned on the horizontal line. In particular, there is a corresponding alignment after each transformation (vertical or horizontal). By alignment, i.e. the shifting of rows or columns of the image area, a correlation in the local area may be lost (in the case of an irregular structure for the image area), since pixels that are next to one another will no longer necessarily be next to one another after alignment (e.g. correlation in

Local area). This information is used in particular to make the decision about the order of the transformations within the decision unit in such a way that the correlation of pixels lying next to one another in the local or time range is optimally used.

One embodiment also consists in the decision unit taking into account at least one of the following mechanisms for determining the sequence of vertical and horizontal transformation: a) With interlaced transmission, only every second line of an image is displayed (and transmitted). By alternating the other two lines, images are created with a time delay, which represent moving images, with the lines of two images that follow one another in time complementing one another to form a full image. In the decision unit, for example, the image header is used to determine whether there is such an interlaced transmission. If there is an interlace method, the horizontal and then the vertical transformation is carried out first. This takes advantage of the fact that in the interlaced method only every second line is transmitted and the correlation of pixels within a line is therefore higher than along a column.

b) Another mechanism is that, as described above, the transformation is carried out first, along the direction of which the correlation of the pixels of the image area to be transformed is greater.

Another development is that an additional dimension is taken into account in the transformation, this additional dimension being examined with regard to the correlation of the pixels in the additional dimension. An example is that the additional dimension is a time axis (3D transformation).

Another embodiment is that the decision unit generates side information in which the order of the transformations is contained. The side information corresponds to a signal which is preferably transmitted to a receiver (decoder) and on the basis of which this receiver is able to extract the information about the sequence of the ■ transformations. This sequence must be taken into account accordingly in the inverse operation of the decoding. In the context of another further development, the horizontal transformation results in the vertical transformation by performing a mirroring on a 45 ° axis before the transformation. Accordingly, a horizontal transformation emerges from the vertical transformation. Due to the mirroring, the transformation order is (virtually) exchanged.

The method is suitable for use in a coder for compressing image data, e.g. an MPEG picture encoder. A corresponding decoder is preferably expanded to include an evaluation option for the side information signal in order to be able to carry out the correct sequence of vertical and horizontal transformation (or the inverse operation in each case) when decoding the image area.

Coders and decoders preferably operate according to an MPEG standard or an H.26x standard.

A further development is that the transformation is a DCT transformation or an inverse IDCT transformation.

Furthermore, to solve the problem, an arrangement for transforming an image area is specified with a decision unit by means of which a vertical transformation of the image area and then a horizontal transformation of the image area or vice versa, first the horizontal transformation and then the vertical transformation of the image area can be carried out.

This arrangement is particularly suitable for carrying out the method according to the invention or one of its developments explained above. Exemplary embodiments of the invention are illustrated and explained below with reference to the drawings.

Show it

Fig.l is a sketch showing steps of transforming an image area;

2 shows a sketch which represents a decision unit and the signals / values generated therefrom;

3 is a sketch illustrating a transmitter and receiver for image compression;

4 shows a sketch with an image encoder and an image decoder in greater detail;

5 shows a possible form of the decision unit in the form of a processor unit.

1 shows steps of a transformation, in particular a DCT transformation, for a predetermined image area, which image area has an irregular structure. Step 101 shows the irregular structure of the image area in an interlaced method, indicated by every other occupied line. The image area is composed of lines 105, 106, 107 and 108. In a step 102 the image that is actually shown in the interlace method is shown, which in turn has lines 105 to 108. The correlation of this

Image area with an irregular structure is particularly high along the lines. Accordingly, in the interlacing method, the lines are first transformed after they have previously been aligned along a vertical line 109. The alignment results in a column-related shift of adjacent pixels. The vertical transformation takes place in the Step 103. A horizontal alignment along a horizontal line 110 is carried out beforehand.

It would also be possible to (additionally) consider a transformation along a time axis. So he can

Step 101 are also interpreted as a representation of a plurality of lines 105 to 108 or a plurality of image areas 105 to 108 which are scanned along a time axis 111 at different times in each case. The location information in the respective lines 105 to 108 or the respective image areas 105 to 108 is high, whereas the correlation between the individual lines 105 to 108 or image areas 105 to 108 is lower due to the scanning along the time axis 111 in the direction of the time dimension.

A sketch is shown in FIG. 2, which represents a decision unit and the signals / values generated therefrom. An input signal or a plurality of input signals 200 are used by the decision unit 201 to determine which of several transformations (horizontal, vertical, temporal) are to be carried out in which order in order to make the best possible use of the correlations in the local or time domain, ie to take high correlations into account in this way that an associated transformation is performed first. The interlaced method discussed in FIG. 1 is used as an example, by means of which the decision unit 201 carries out the horizontal transformation before the vertical transformation. The actual transformations are carried out in a unit 202, in which the image areas are also aligned. The resulting coefficients 203 are the result of the transformation unit 202 (see also illustration in step 104). Furthermore, the decision unit 201 generates side information 203 which contains the sequence of the transformations to be carried out. The arrangement shown in Figure 2 is in particular part of a transmitter (encoder) 301, as shown in Figure 3. Image data 303, preferably in compressed form, is transmitted from the transmitter 301 to a receiver (decoder) 302. The page information 203 described in FIG. 2 is also transmitted (here identified by a connection 304) from the transmitter 301 to the receiver 302. The page information 304 is decoded there and the information about the order of the transformations is obtained therefrom.

It should also be pointed out that there are basically two options for performing the transformations: Either both transformations (horizontal and vertical) are actually interchanged. In terms of programming, this leads to a not inconsiderable effort. Alternatively, the order of the transformations can be determined (using the decision unit 201), the vertical transformation being the horizontal one

Transformation emerges by mirroring the image area on a 45 ° axis (top left to bottom right). Due to the mirroring, the transformation order is (virtually) exchanged. Accordingly, the mirroring operation on the part of the receiver 302 must be taken into account.

FIG. 1 shows an image or with an associated image decoder in a higher degree of detail (block-based image coding method according to the H.263 standard).

A video data stream to be encoded with chronologically successive digitized images is fed to an image coding unit 201. The digitized images are divided into macro blocks 202, each

Macroblock has 16x16 pixels. The macro block 202 comprises 4 picture blocks 203, 204, 205 and 206, each picture block 8x8 Contains pixels to which luminance values (brightness values) are assigned. Furthermore, each macroblock 202 comprises two chrominance blocks 207 and 208 with chrominance values (color information, color saturation) assigned to the pixels.

The block of an image contains a luminance value (= brightness), a first chrominance value (= hue) and a second chrominance value (= color saturation). The luminance value, first chrominance value and second chrominance value are referred to as color values.

The image blocks are fed to a transformation coding unit 209. In the case of differential image coding, values to be coded from image blocks of temporally preceding images are subtracted from the image blocks to be currently coded; only the difference formation information 210 is supplied to the transformation coding unit (Discrete Cosine Transformation, DCT) 209. For this purpose, the current macroblock 202 is communicated to a motion estimation unit 229 via a connection 234. Spectral coefficients 211 are formed in the transformation coding unit 209 for the picture blocks or difference picture blocks to be coded and fed to a quantization unit 212. This quantization unit 212 corresponds to the quantization device according to the invention.

Quantized spectral coefficients 213 are supplied to both a scan unit 214 and an inverse quantization unit 215 in a reverse path. After a scanning method, for example a "zigzag" scanning method, entropy coding is carried out on the scanned spectral coefficient 232 in an entropy coding unit 216 provided for this purpose. The entropy-coded spectral coefficients are transmitted as coded image data 217 to a decoder via a channel, preferably a line or a radio link. An inverse quantization of the quantized spectral coefficients 213 takes place in the inverse quantization unit 215. Spectral coefficients 218 obtained in this way are fed to an inverse transformation coding unit 219 (inverse discrete cosine transformation, IDCT).

Reconstructed coding values (also differential coding values) 220 are fed to an adder 221 in the differential image mode. The adder 221 also receives coding values of an image block, which result from a temporally preceding image after motion compensation has already been carried out. Reconstructed image blocks 222 are formed with the adder 221 and stored in an image memory 223.

Chrominance values 224 of the reconstructed image blocks 222 are fed from the image memory 223 to a motion compensation unit 225. For brightness values 226, an interpolation takes place in an interpolation unit 227 provided for this purpose. The number of brightness values contained in the respective image block is preferably doubled on the basis of the interpolation. All brightness values 228 are supplied to both the motion compensation unit 225 and the motion estimation unit 229. The motion estimation unit 229 also receives the image blocks of the macro block to be coded in each case (16 × 16 pixels) via the connection 234. In the motion estimation unit 229, the motion is estimated taking into account the interpolated brightness values (“motion estimation on a half-pixel basis”). Preferably at

Motion estimation of absolute differences between the individual brightness values in the macro block 202 currently to be coded and the reconstructed macro block is determined from the previous image.

The result of the motion estimation is a motion vector 230 through which a local shift of the selected one Macroblocks from the temporally preceding picture to the macroblock 202 to be encoded is expressed.

Both brightness information and chrominance information relating to the macroblock determined by the motion estimation unit 229 are shifted by the motion vector 230 and subtracted from the coding values of the macroblock 202 (see data path 231).

5 shows a processor unit PRZE which is suitable for carrying out transformation and / or

Compression / decompression. The processor unit PRZE comprises a processor CPU, a memory SPE and an input / output interface IOS, which is used in different ways via an interface IFC: an output is visible on a monitor MON and / or on a printer via a graphic interface PRT issued. ^• An entry is made via a mouse MAS or a keyboard TAST. The processor unit PRZE also has a data bus BUS, which ensures the connection of a memory MEM, the processor CPU and the input / output interface IOS. Furthermore, additional components can be connected to the data bus BUS, for example additional memory, data storage (hard disk) or scanner.

Bibliography :

[1] J. De Lameillieure, R. Schäfer: "MPEG-2 image coding for digital television", television and cinema technology, 48th year, No. 3/1994, pages 99-107.

Claims

claims

1. Method for transforming an image area, in which, depending on a decision unit, first a vertical transformation of the image area and then a horizontal transformation of the image area or vice versa, first the horizontal transformation and then the vertical transformation are carried out.

2. The method of claim 1, wherein the image area has an irregular structure.

3. The method according to claim 1 or 2, a) in which, before or after the vertical transformation, the image area is aligned along a horizontal line; b) in which the image area is aligned along a vertical line before or after the horizontal transformation.

4. The method as claimed in one of claims 1 to 3, in which the decision unit carries out at least one of the following mechanisms: a) if the image area is interlaced, first the horizontal and then the vertical transformation is carried out; b) the (horizontal or vertical) transformation is carried out first along which one

Correlation of pixels of the image area is stronger.

5. The method according to any one of the preceding claims, in which an additional dimension is taken into account in the transformation.

6. The method of claim 5, wherein the additional transformation is performed along a time dimension.

7. The method according to any one of the preceding claims, in which the decision unit generates side information in which the order of the transformations is contained.

8. The method according to any one of the preceding claims, wherein the horizontal transformation results from the vertical transformation by performing a mirroring on a 45-degree axis before the transformation.

9. The method according to any one of the preceding claims, wherein the vertical transformation emerges from the horizontal transformation by performing a mirroring on a 45-degree axis before the transformation.

10. The method according to any one of the preceding claims for use in a coder for compressing image data.

11. The method according to any one of claims 7 to 10, wherein the page information is used in a decoder to decompress the image area.

12. The method according to claim 10 or 11, wherein the operations of the encoder and / or the decoder are determined according to an MPEG standard or according to an H.26x standard.

13. The method according to any one of the preceding claims, wherein the transformation is a DCT transformation or an inverse IDCT transformation.

14. Arrangement for transforming an image area, with a decision unit which is set up in such a way that, depending on a value determined by the decision unit, first a vertical transformation of the image area and then a horizontal transformation of the image area or vice versa, first the horizontal transformation and then the vertical transformation can be carried out.