EP0910927A1

EP0910927A1 - Process for coding and decoding stereophonic spectral values

Info

Publication number: EP0910927A1
Application number: EP97925036A
Authority: EP
Inventors: Uwe Gbur; Martin Dietz; Bodo Teichmann; Karlheinz Brandenburg; Heinz GERHÄUSER; Jürgen HERRE; James Johnston
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV; Lucent Technologies Inc; AT&T Labs Inc
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV; AT&T Labs Inc; Nokia of America Corp
Priority date: 1996-07-12
Filing date: 1997-06-03
Publication date: 1999-04-28
Anticipated expiration: 2017-06-03
Also published as: ES2143868T3; NO990106L; DK0910927T3; AU712196B2; AU3031897A; EP0910927B1; PT910927E; KR20000022435A; NO990106D0; NO317570B1; ATE188832T1; US6771777B1; WO1998003036A1; CA2260090C; DE19628292B4; GR3032444T3; JP3622982B2; KR100316582B1; JP2000505266A; DE19628292A1

Abstract

A method of coding stereo audio spectral values first carries out grouping of those values in scale factor bands, with which scale factors are associated. Sections are formed next, each comprising at least one scale factor band. The spectral values are coded within at least one section with a code book assigned to the section, out of a plurality of code books each with a code book number assigned to it, the number of the code book used being transmitted as side information to the coded stereo audio spectral values. At least one additional code book number is provided, which does not refer to a code book but shows information relevant to the section to which it is assigned. A method of decoding stereo audio spectral values which are partly coded by the intensity stereo process and which have side information uses the relevant information, showing the additional code book numbers, to cancel the existing coding of the stereo audio spectral values.

Description

Method for coding and decoding stereo audio spectral values

description

The present invention relates to encoding and decoding stereo audio spectral values, and more particularly to indicating the fact that stereo intensity encoding is active.

Modern audio coding methods or decoding methods, which operate according to the MPEG layer 3 standard, for example, are able to compress the data rate of digital audio signals by a factor of twelve, for example, without noticeably deteriorating the quality thereof.

In addition to a high coding gain in the individual channels, e.g. the left channel L and the right channel R, the redundancy and irrelevance of the two channels among one another is also used in the stereo case. Known and already used methods are the so-called MS stereo method (MS = middle side) and the intensity stereo method (IS method).

The MS stereo method known to those skilled in the art essentially uses the redundancy of the two channels with one another, a sum of the two channels and a difference between the two channels being calculated, which then each transmit as modified channel data for the left and right channel become. The redundancy between the two channels removed in the encoder is added again in the decoder. This means that the MS stereo procedure is exactly reconstructive.

In contrast, the intensity stereo method primarily uses stereo irrelevance. Regarding the stereo irrelevance it can be said that the spatial perception of the human hearing system depends on the frequency of the perceived audio signals. At lower frequencies, both the amount and phase information of both stereo signals are evaluated by the human auditory system, the perception of high-frequency components being based primarily on the analysis of the energy-time envelopes of both channels. The exact phase information of the signals in both channels is therefore not relevant for spatial perception. This property of the human ear is used to use the stereo irrelevance for further data reduction of audio signals by the intensity stereo method.

Since the stereo intensity method cannot resolve precise location information at high frequencies, it is therefore possible to transmit a common energy envelope for both channels instead of two stereo channels L, R from an intensity limit frequency determined in the encoder. In addition to this common energy envelope, roughly quantified direction information is also transmitted as side information.

Since a channel is only partially transmitted when using intensity stereo coding, the bit savings can be up to 50%. However, it should be noted that the IS method in the decoder is not exactly reconstructive.

In the IS method, which was previously used in the standard MPEG layer 3, a so-called mode_extension_bit (mode_extension_bit) indicates that the IS method is active at all in a block of stereo audio spectral values, each block having an associated one Mode_extension_bit.

1 shows a basic illustration of the known IS method. Stereo audio spectral values for a channel L 10 and for a channel R 12 are summed at a summation point 14 by an energy envelope I = L _j _ + R ^ of the two channels. L ^ and R ^ here represent the stereo audio spectra values of channel L and channel R in any scale factor band. As already mentioned, the use of the IS method is only permitted above a certain IS cutoff frequency, in order to avoid coding errors in the coded Introduce stereo audio spectral values. Therefore, the left and right channels must be coded separately in a range from 0 Hz to the IS cutoff frequency. The determination of the IS cutoff frequency as such is carried out in a separate algorithm which does not form part of this invention. From this limit frequency, the encoder encodes the sum signal of the left channel 10 and the right channel 12, which is formed at the summation point 14.

In addition to the energy envelope, i.e. the sum signal from the left and right channel, which can be transmitted, for example, in the coded left channel, scaling information 16 for channel L and scaling information 18 for channel R are also necessary for decoding. In the intensity stereo method, as implemented in MPEG Layer 2, for example, scale factors for the left and right channels are transmitted. At this point, however, it should be noted that with the IS method in MPEG Layer 3 for IS-coded stereo audio spectral values, intensity direction information is only transmitted in the right channel, with which the stereo audio spectral values are then decoded again, as explained further below.

The scaling information 16 and 18 are transmitted as side information in addition to the coded spectral values of the channel L and the channel R. A decoder supplies decoded audio signal values to a decoded channel L '20 or to a decoded channel R' 22, the scaling information 16 for channel R and the scaling information 18 for channel L with the decoded stereo audio spectral values of the respective channels an L multiplier 24 or an R multiplier 26 in order to decode the originally coded stereo audio spectral values again.

Before applying IS coding above a certain IS cutoff frequency or MS coding below this cutoff frequency, the stereo audio spectral values for each channel are grouped into so-called scale factor bands. These bands are adapted to the perceptual properties of the hearing. Each of these bands can be amplified with an additional factor, the so-called scale factor, which is transmitted as side information for the respective channel and which represents part of the scaling information 16 and the scaling information 18 from FIG. 1. These factors shape an interference noise introduced by quantization in such a way that it is "masked" taking psychoacoustic considerations into account and thus becomes inaudible.

2a shows a format of the encoded right channel R, which is used, for example, in an audio coding method MPEG layer 3. All further explanations regarding the intensity stereo coding also relate to the method according to the MPEG layer 3 standard. The individual scale factor bands 28, into which the stereo audio spectral values are grouped, are shown schematically in the first line in FIG. 2a. The same bandwidth of the scale factor bands drawn in FIG. 2a only serves for clarity of presentation and will not occur in practice due to the psychoacoustic properties of the auditory system.

In the second line of FIG. 2a there are coded stereo audio spectral values sp which are not equal to zero below an IS cut-off frequency 32, the stereo audio spectral values in the right channel above the IS cut-off frequency being set to zero (Zero_Part) as mentioned above nsp (nsp = Nu11 spectrum). The third line of FIG. 2a contains part of the page information 34 for the right channel. This part of the side information 34 shown consists, on the one hand, of the scale factors skf for the area below the IS cut-off frequency and of direction information rinfo 36 for the area above the IS cut-off frequency 32. This directional information is also used in the intensity stereo method to ensure a rough spatial resolution of the IS-coded frequency range. This direction information rinfo 36, which is also called intensity positions (is_pos), is therefore transmitted in the right channel instead of the scale factors. It should be noted once again that below the IS cutoff frequency, the scale factors 34 corresponding to the scale factor bands 28 are still present in the right channel. The intensity positions 36 indicate the perceived stereo imaging position (the ratio from left to right) of the signal source within the respective scale factor bands 28. In each scale factor band 28 above the IS cutoff frequency, the decoded values of the transmitted stereo audio spectral values are scaled according to the MPEG Layer 3 method by the following scaling factors k _L for the left channel and k _R for the right channel:

k _L = is_ratio / (l + is_ratio) (1)

and

k _R = 1 / (l + is_ratio) (2)

The equation for is_ratio is as follows:

is_ratio = tan (is_pos-τr / 12) (3)

The value is_pos is a value quantized with 3 bits, whereby only the values from 0 to 6 represent valid position values. From the following two equations, the left and right channels can be calculated back from the I signal (I = L _j _ + Rj: R _j _ = I ^• is_ratio / (l + is_ratio) = I ^• k _L (4)

Li - I ^• l / (l + is_ratio) = I ^■ k _R (5)

R ^ and L ^ represent the intensity stereo decoded stereo audio spectral values. At this point it should be noted that the format of the left channel is analogous to the format of the right channel shown in FIG. 2a, but in the left channel above the IS- Cutoff frequency 32 instead of the zero spectrum, the combined spectrum I = L ^ + R ^ can be found, and furthermore there is no direction information is_pos for the left channel, but normal scale factors. The transition from the quantized sum spectral values not equal to zero to the zero values in the right channel can implicitly indicate the IS cut-off frequency to the decoder with the MPEG Layer 3 standard.

In the encoder, the transmitted channel L is thus calculated as the sum of the left and the right channel, the transmitted direction information can be determined using the following equation:

is_pos = nint [arctan (VE _L / VE _R ) ^• 12 / π] (6)

The function nint [x] represents the function "next integer", where E _L and E _{R are} the energies in the respective scale factor bands of the left and right channels. This formulation of the encoder / decoder leads to an approximate reconstruction of signals in the left and in the right channel.

As already mentioned, in known audio coding methods, the stereo audio spectral values are grouped into the scale factor bands, these bands being adapted to the perceptual properties of the hearing. In the audio coding method according to the MPEG layer 3 standard, these scale factor bands are now divided into exactly three regions. In order to Areas with the same signal statistics should now be grouped. This is advantageous for the redundancy reduction now taking place by means of the known Huffman coding. For each of these regions from scale factor bands 28, one of a plurality of Huffman tables is now selected, in which the gain from the redundancy reduction by means of the Huffman coding using the selected Huffman table is greatest. This table is displayed in the bit stream of encoded data using a 5-bit value for each region. There are 30 different tables, but tables 4 and 14 are not used.

The non-backward-compatible NBC coding method, which is currently in the standardization process, differs from the standard audio coding method MPEG Layer 3, among other things, in that not only exactly three regions from scale factor bands are allowed in the bitstream syntax for this method, but that so-called sections or "sections" can be present in any number and can have any number of scale factor bands. A section is now assigned a corresponding Huffman table from a plurality of such tables in analogy to the previously described method in MPEG Layer 3 to achieve a maximum redundancy reduction, which table is then to be used for decoding. In extreme cases, for example, a section consists of only a single scale factor band. In practice, however, this is unlikely to occur, since the page information required would then be much too large. The NBC method has a total of 16 Huffman coding table numbers that are transmitted as 4-bit values. This means that one of the twelve existing coding table numbers can be selected.

The object of the present invention is to provide methods for coding or decoding stereo audio spectral values, in which information relevant to the coding or decoding is signaled with a minimal amount of side information. This object is achieved by a method for encoding stereo audio spectral values according to claim 1 and by a method for decoding stereo audio spectral values partially encoded in the intensity stereo method according to claim 2.

The present invention is based on the recognition that additional coding table numbers which are not used to refer to coding tables can indicate other information relevant for a section. The "additional" code table numbers are the code table numbers that do not refer to code tables. Due to a 4-bit coding of twelve different coding table numbers, the numbers 13, 14 and 15 are, as it were, freely available for assignment with other information. In a preferred embodiment of the present invention, two (no. 14 and no. 15) of the three (no. 13, no. 14 and no. 15) additional coding table numbers are used in order to, on the one hand, refer to an intensity which is present in a section. Coding and on the other hand to point out the mutual phase relationship of IS-coded stereo audio spectral values in two stereo channels.

The additional unused coding table number 13 can be used to indicate adaptive Huffman coding.

Preferred embodiments of the present invention are explained below with reference to the accompanying drawings. Show it:

1 shows the signal flow in a coding / decoding scheme according to the intensity stereo method;

2a shows a format of the data in the presence of stereo intensity coding for the right channel for the standard MPEG Layer 3; 2b shows a format of the data in the presence of stereo intensity coding for the right channel for the MPEG-NBC method; and

3 is a schematic block diagram of a decoder that implements the present invention.

A method for encoding stereo audio spectral values and the method for decoding stereo audio spectral values partially encoded in the intensity stereo method according to a first exemplary embodiment of the present invention use novel signaling of the presence of the intensity stereo encoding within a section. According to the present invention, there are also 16 coding table numbers. In contrast to the prior art, however, only the first 12 coding table numbers (No. 1 to No. 12) correspond to actual coding tables. With the help of the last and the penultimate coding table number, it is now signaled that the stereo intensity method is used within the section to which this coding table number is assigned.

2b shows a format of the data for the right channel R in the presence of stereo intensity coding, using the MPEG2-NBC method. The difference to FIG. 2a, or to the MPEG Layer 3 method, is that a user now has the flexibility to selectively insert or deactivate an intensity stereo coding of the stereo audio spectral values for each section even above the IS cut-off frequency 32 to switch off. In comparison to MPEG Layer 3, the IS cut-off frequency is therefore no longer a correct cut-off frequency, since with the NBC method, the IS coding can also be switched off or on again above the IS cut-off frequency. This was not possible with Layer 3, ie the stereo audio spectral values above the IS cut-off frequency had to be in any case up to the upper end of the spectral range if IS coding was available for a section be IS-coded. The new NBC method does not now have to activate the IS coding for the entire spectral range above the IS limit, but the same also allows the IS coding to be switched off, if this is signaled. Since a coding table number must be transmitted for a section anyway according to the bit stream syntax, the side information (“overhead”) does not increase in the inventive signaling described.

The scale factors transmitted in a section with IS coding for the right channel now also represent the direction information 36 analogously to the prior art, these values themselves also being subjected to a difference and Huffman coding. In the right channel, as already mentioned, there are no stereo audio spectral values in the scale factor bands that are not IS-coded, but a zero spectrum. In IS-coded sections, the left channel contains the sum signal of the left and right channels. However, the sum signal is normalized in such a way that its energy within the respective scale factor bands after IS decoding corresponds to the energy of the left channel. Therefore, the left channel can also be adopted unchanged in the decoding device if IS coding is used and does not have to be determined specifically by means of a re-scaling rule. The stereo audio spectral values of the right channel can now be calculated back from the stereo audio spectral values of the left channel using the direction information is_pos 36, which are present in the side information of the right channel.

As described at the beginning, the stereo intensity method according to the prior art produces two coherent signals for the left or right channel, which differ only in their amplitude, ie intensity, depending on the direction information is_pos 36 (equations (4) and (5)). In the present invention, since the presence of the stereo intensity coding is signaled by means of two "unreal" coding table numbers, a phase relationship of the two channels to one another can be included. If the channels have the same phase position, the back-calculation rule according to the invention to be carried out in the decoder is as follows:

R ± = 0.5 ^Λ (0.25 ^• is_pos (sfb)) 'L ^, (7)

while in the case of an opposite phase, the spectrum is multiplied by -l, which results in the following equation for the calculation of the right channel:

Ri = (-1) ^• 0.5 ^A (0.25 ^• is_pos (sfb)) ^• Li. (8)

R ^ in the two previous equations denotes the back-calculated, i.e. decoded, stereo audio spectral values of the right channel, sfb denotes the scale factor band 28 to which the direction information is_pos 36 are assigned. L ^ denotes the stereo audio spectral values of the left channel, which are adopted unchanged in the decoder.

Coding table number 15 now indicates whether the first retroactive accounting step should be used, while coding table number 14 indicates that the second retroactive accounting rule should be used, i.e. that the two channels are out of phase. It is obvious to those skilled in the art that the terms in-phase and out-of-phase are used broadly in the sense of this application. For example, a phase discriminator can be provided which, from a certain phase discriminator output value, which can be, for example, 90 °, determines that the signals are out of phase, the same being considered to be in phase with a phase difference of less than 90 °.

In the first exemplary embodiment described, a section which consists of at least one scale factor band exists, by means of the code table numbers 14 or 15, the phase relationship of the two channels to one another is determined. The side information caused by IS and phase signaling is 8 bits for a section, which is composed of four bits for the section length and four bits for the coding table number 14 or 15. If an audio signal is to be encoded which has frequent changes in the phase position in scale factor bands of its stereo audio spectral values, then according to the first exemplary embodiment a new section ("section") must be started each time the phase position is reversed from scale factor band to scale factor band. A signal with a frequently changing phase position therefore generates a large number of sections, since each section can only display either the in-phase or the out-of-phase of its stereo audio spectral values in the two channels due to the coding table number assigned to it. An unfavorable signal will therefore lead to a large number of sections and thus to a large amount of page information.

A second exemplary embodiment of the present invention allows a phase-factor coding on a scale factor band basis in a section in which the intensity coding is active. By means of this method according to the second exemplary embodiment of the present invention, using an MS mask, which is described below, it is possible to encode phase factor by scale factor band without increasing the number of sections and without any additional expenditure.

It will be apparent to those skilled in the art that the center-side method and the intensity stereo method are mutually exclusive in a scale factor band. These two methods are therefore orthogonal.

If MS coding of stereo audio spectral values is used in a bit stream, a signaling bit in the side information will be set accordingly globally turn on the MS coding. Setting this bit means that an MS bit mask is transmitted, with which it is possible to selectively switch MS coding on or off for each scale factor band (scfbd). One bit is reserved in the MS bit mask for each scale factor band, which is why the length of the bit mask corresponds to the number of scale factor bands.

In the scale factor bands in which IS is active, the MS scale factor information is not necessary, since the MS coding must not be activated here. The MS bit mask can be used for other signaling in this area. It is therefore possible to display details of the IS coding using the MS bit mask. In accordance with the first exemplary embodiment, the information relating to the phase position of the channels is specified in a section by means of the coding table numbers 14 and 15 in IS coding. The coding table numbers also indicate that IS coding is active at all in a section.

In deviation from the first exemplary embodiment, the MS bit mask is used in the second exemplary embodiment of the present invention to allow scale factor bands with different phase positions in one section. The MS bit mask is now used to indicate the phase relationship of the individual scale factor bands in this section in relation to the coding table number, which signals that IS coding is active in a section. If a bit in the MS bit mask for a scale factor band is not set (ie zero), the phase information indicated by the coding table number for the section in which the scale factor band is located is retained, while if a (ie one) bit is set in the MS bit mask for the scale factor band which is inverted by the phase table of the two channels indicated by the coding table number for the section in which the scale factor band is located. In principle, it is an EXCLUSIVE-OR link between the one indicated by the coding table number Phase position and the MS bit mask.

Specifically, the phase relationships of the two stereo channels L and R calculated from the coding table number and MS bit mask in a scale factor band located in a section in which the IS coding is used are as follows:

Coding table number 15 15 14 14 (for one section)

MS - bit mask 0 1 0 1 (for a scale band)

Phase position of L and R 0 ° 180 ° 180 ° 0 °

Retroactive accounting rule Eq. 7 eq. 8 eq. 8 eq. 7

Table 1

The described second exemplary embodiment of the present invention thus allows scale factor bands with stereo audio spectral values with different phase positions to occur in one section, as a result of which fewer sections than in the first exemplary embodiment have to be formed for coding. This means that less page information also has to be transmitted.

In deviation from the exemplary embodiment described above, the additional coding table numbers can also be used to display other information relevant for a section.

Further information relevant to a section can, for example, indicate the use of an adaptive ven Huffman coding in one section. With adaptive Huffman coding, an adapted Huffman table can be generated depending on the signal statistics. The coding table number 13 instructs the coding device not to use any of the twelve fixed Huffman tables, but to use an adapted Huffman table which is not known a priori to the decoder. This is advantageous if the signal statistics in a section cannot be optimally coded, ie compressed, with one of the twelve fixed coding tables. The coding is no longer fixed to the twelve fixed Huffman tables, but can generate and use a table that is optimally adapted to the signal statistics. The information about the adaptive coding table is transmitted as additional page information.

A decoding device requires this additional side information in order to calculate back from it the adapted Huffman table used in the coding, in order to be able to correctly decode the Huffman-coded stereo audio spectral values again.

3 shows a simplified block diagram of a decoder that can carry out the method for decoding according to the present invention. Audio spectral values partially coded using the intensity stereo method are each supplied to inverse quantizers 38 and 40, the inverse quantizers reversing the quantization introduced during coding. The dequantized stereo audio spectral values then arrive in an MS decoder 42. This MS decoder 42 reverses the middle-side coding introduced in the encoder. An IS decoder 44 now uses the previously described recalculation regulations (7) and (8) in order to obtain the original stereo audio spectral values again for the IS-coded scale factor bands. Respective reverse transformation devices for the left or right channel now convert the stereo audio spectral values into stereo audio time evaluate L (t), R (t). It will be apparent to those skilled in the art that the inverse transformers 46 and 48 can be implemented by an inverse MDCT, for example.

Claims

claims

1. Method for coding stereo audio spectral values, with the following steps:

Grouping the stereo audio spectral values into scale factor bands (28) to which scale factors are assigned;

Forming sections each consisting of at least one scale factor band (28);

Coding the stereo audio spectral values within at least one section with a coding table assigned to the at least one section from a plurality of coding tables, each of which is assigned a coding table number, the coding table number of the coding table used being transmitted as side information about the coded stereo audio spectral values,

wherein at least one additional coding table number is provided, which does not refer to a coding table, but rather displays information relevant to the section to which it is assigned.

2. A method for decoding coded stereo audio spectral values which have side information, with the following steps:

Acquiring a coding table number based on the page information for each section of the coded stereo audio spectral values;

Decoding the stereo audio spectral values of a section with an encoding table number that does not refer to an encoding table but indicates information relevant to the section to which it is associated, according to the displayed information; and Decoding the stereo audio spectral values of another section whose coding table number points to a corresponding coding table using this coding table.

3. The method according to any one of claims 1 or 2,

in which at least one additional coding table number indicates coding according to the intensity stereo method of the stereo audio spectral values of the assigned section.

4. The method according to any one of the preceding claims,

in which at least one additional coding table number indicates adaptive Huffman coding of the stereo audio spectral values of the assigned section.

5. The method according to any one of the preceding claims,

in which the at least one additional coding table number for a section which is coded according to the stereo intensity method also indicates a phase relationship between two stereo channels.

6. The method according to claim 5,

in which one of two additional coding table names indicates the same phase position of the two stereo channels, the following recalculation rule for intensity decoding applies:

RL = 0.5 ^A (0.25 ^• is_pos (sfb)) ^• L ^,

where is pos intensity direction information for the represents existing scale factor band, while L ^ are the normalized sum signals of the stereo audio spectral values of the left (L) and right (R) channels.

7. The method according to claim 5 or 6,

in which one of two additional coding table numbers indicates the same phase position of the two stereo channels, the following recalculation rule for intensity decoding applies:

R _j = (-1) ^• 0, ^Λ 5 (0, 25 ^• is_pos (sfb)) ^• L ^

where is_pos represents intensity direction information for the existing scale factor band, while L ^ is the normalized sum signals of the stereo audio spectral values of the left (L) and right (R) channels.

8. The method according to any one of the preceding claims,

in which the intensity stereo method in a left channel forms a normalized sum signal of the stereo audio spectral values of the left and right channels and scale factors as side information, while in the right channel the spectrum is zero and intensity direction information is encoded as side information.

9. The method according to any one of the preceding claims,

in which a bit mask having a bit for each scale factor band is used, a bit of the bit mask for a scale factor band in a section to which one of the additional coding table numbers is assigned being combined with the additional coding table number in order to establish a phase relationship for two stereo to determine channels.

10. The method according to claim 9,

in which the bit mask is an MS bit mask and the additional coding table numbers are linked with the MS bit mask on a scale factor band basis by means of an EXCLUSIVE-OR operation.