WO2000075920A1

WO2000075920A1 - A method of improving the intelligibility of a sound signal, and a device for reproducing a sound signal

Info

Publication number: WO2000075920A1
Application number: PCT/SE2000/001100
Authority: WO
Inventors: Alberto JIMENEZ-FELTSTRÖM
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 1999-06-03
Filing date: 2000-05-29
Publication date: 2000-12-14
Also published as: AU5264100A; SE9902057D0

Abstract

In a method of improving the intelligibility of a first sound signal (S0) having a certain frequency spectrum, at least one additional sound signal (S1; S1, S2, ..., Sn) is generated comprising a transposition of the spectrum of the first signal in frequency. A combined signal (S+) is generated comprising the first signal (S0) as well as the at least one additional signal (S1; S1, S2, ..., Sn), and the combined signal is reproduced to a user. In this way the intelligibility of a sound signal is improved, while it is still possible to recognize the original sound of a speaker. The natural sound of the speaker is maintained due to the fact that the original sound signal is still contained in the resulting signal. Due to the additional signals the resulting signal will probably also contain speech information in a frequency band in which the person has a better perception.

Description

A method of improving the intelligibility of a sound signal , and a device for reproducing a sound signal

The invention relates to a method of improving the intel^¬ ligibility of a sound signal having a certain frequency spectrum, and wherein at least one additional sound sig^¬ nal is generated comprising a transposition of the spectrum of the first sound signal in the frequency range. Further, the invention relates to a corresponding device.

In many situations people have difficulty in perceiving sounds to which they want to listen. This could be due to hearing deficiency or noisy environments, and in both situations it creates difficulties in e.g. telephoning or using radio or telephone headphones.

Hearing deficiency generally manifests itself as a poor perception of sounds located within a certain frequency band. The position and width of this frequency band vary depending on the individuals. Generally, as people become older, their sensitivity at higher frequencies decreases, and for them it is thus easier to hear speech with a lower pitch level.

In many noisy environments, the noise energy is concentrated in a narrow frequency band. This band might coincide with the pitch of the speech to be heard, in which case it will decrease the intelligibility of the speech considerably. If the band and the pitch do not, on the other hand, coincide, the hearing mechanism will act as an adaptive band-pass filter filtering out the undesired noisy frequency bands. Normally, therefore, a user will only have problems of understanding speech whose pitch more or less coincides with the noise frequency band. The information in speech is carried in different sound frequencies. If some of the speech information is never heard, the result will be a lower intelligibility of the speech.

In the noise situation, a solution could be the use of earphones. However, this necessitates the use of extra equipment, i.e. the earphones, and especially in the case of telephoning this will be too complicated for many us- ers. Further, it does not provide any help in the case of hearing deficiency.

US 4 764 957 discloses a device intended to correct individual hearing deficiencies. The device comprises a band- pass filter, the central frequency and/or width of which can be adjusted to coincide with the frequency range in which the person has a poor perception of sounds. The frequencies in this range are given an additional amplification in order to compensate for the decreased sensi- tivity of the person. It provides some help, but it only works when the person has at least a certain remaining sensitivity in the relevant frequency range and this is not always the case. Although this device might also be used in the noise situation, the frequency of the noise might change during the use of the device, and in this situation the user would have to readjust the device during its use, which would be rather complicated.

Another solution is disclosed in EP 54 450. This is a hearing aid for acutely or profoundly deaf persons. Samples of acoustic signals in frequency bands such as 1500 to 2400 Hz and 4800 to 6000 Hz, which are inaudible to the deaf person, are transposed into samples in a region of lower frequency (500 to 1000 Hz) which is audible to the deaf person. A corresponding, but less drastic, solution is described in an article by Ian B. Thomas and Francis E. Flavin, "The Intelligibility of Speech Transposed Downward in Frequency by One Octave, " Journal of the Audio Engineering Society, February 1970, Volume 18, Number 1, Pages 56 to 62. Here, speech is transposed down one octave in real time in order to improve the intelligibility of speech for partially deaf persons. The idea is that the pitch of the speech is transposed from a frequency band within which the person has a poor perception of sounds to another frequency band in which the person has a better perception. However, these solutions incorporating a pitch transposition have the very important drawback that the natural sound of the speaker is changed, i.e. the voice of a woman might sound like the voice of a man, or vice versa. Such a change of the natu- ral sound is normally unwanted, mainly because it makes it difficult to recognize specific and well known voices, but also because people do not, generally, like to know that their own voices are distorted in this way, violating the natural sound characteristics of their speech.

In the field of processing of music it is known to generate a ^N synthetic chorus" to be used as a harmonizing background to the original sound of a singer in a so- called karaoke system. US 5 719 346 discloses such a sys- tern in which the apparatus collects an original of a vocal sound and adds a chorus sound derived from the original sound. The chorus sound is generated by shifting a pitch of the collected vocal sound by a calculated pitch difference. The object of this apparatus is alone to pro- vide the harmonizing effect.

Thus, it is an object of the invention to provide a method of improving the intelligibility of a sound signal in which it is still possible to recognize the original sound of a speaker. According to the invention, this is achieved in that a combined signal is generated comprising the first sound signal as well as the at least one additional sound signal, said combined signal being reproduced to a user.

By combining the original sound signal with one or more signals comprising the same speech information, but in which the pitch of speech is transposed to one or more other frequency bands, it is ensured that the resulting sound signal will probably contain speech information in a frequency band in which the person has a better perception, while at the same time the natural sound of the speaker is maintained due to the fact that the original sound signal is also contained in the resulting signal. The effect will be that the speech is pronounced by one or more fictive voices simultaneously with the original voice .

When, as stated in claim 2, several additional sound sig- nals are generated and comprised in the combined signal, the probability that at least one or some of the additional signals will be located within an area in which the person has a better perception is increased. The effect will be a chorus in the background to the original voice.

In an expedient embodiment of the invention, which is stated in claim 3, at least one of the additional signals is generated at frequencies below those of the first sig- nal, and at least one other of the additional signals at frequencies above those of the first signal. This embodi^¬ ment is advantageous in the noise situation or in cases where a hearing deficiency manifests itself as a poor perception of sounds located within a narrow frequency band. In cases where a person's sensitivity at higher frequencies is decreased due to old age, an alternative embodiment is preferable, in which all the additional signals are generated at lower frequencies than the original sound signal.

When, as stated in claim 4, the combined signal is generated such that the amplitude of the first sound signal is higher than the amplitude of the at least one additional sound signal,- the natural sound of the speaker is recognized more easily, while the intelligibility is still im- proved by the other signals.

As mentioned, the invention further relates to a device comprising means for reproducing a sound signal to a user, said means comprising means for receiving a first sound signal having a certain frequency spectrum, and means for generating at least one additional sound signal comprising a transposition of the spectrum of the first sound signal in the frequency range.

When the device further comprises means for generating a combined signal comprising the first sound signal as well as the at least one additional sound signal, and means for reproducing the combined signal to the user, the intelligibility of a sound signal is improved, while it is still possible to recognize the original sound of a speaker .

When, as stated in claim 6, the device is adapted to generate several additional sound signals and to generate the combined signal such that it comprises each of these additional signals, the probability that at least one or some of the additional signals will be located within an area in which the person has a better perception is increased. The effect will be a chorus in the background to the original voice.

In an expedient embodiment of the invention, which is stated in claim 7, the device is adapted to generate at least one of the additional signals at frequencies below those of the first signal, and at least one other of the additional signals at frequencies above those of the first signal. This embodiment is advantageous in the noise situation or in cases where a hearing deficiency manifests itself as a poor perception of sounds located within a narrow frequency band. In cases where a person's sensitivity at higher frequencies is decreased due to old age, an alternative embodiment is preferable, in which all the additional signals are generated at lower frequencies than the original sound signal.

When, as stated in claim 8, the device is adapted to generate the combined signal such that the amplitude of the first sound signal is higher than the amplitudes of the additional sound signals, the natural sound of the speaker is recognized more easily, while the intelligibility is still improved by the other signals.

When, as stated in claim 9, the device comprises means for regulating the amplitude of each of the additional sound signals individually, it will be easier to adapt the device to the hearing deficiency of a specific person or to the noise in an specific environment.

When, as stated in claim 10, the device comprises means for disabling the means for generating the combined signal, it will be able to switch the effect on and off according to actual demand. This possibility will e.g. be useful when a user enters or leaves noisy environments.

Further, as stated in claim 11, the device may expediently be a cellular telephone, or, as stated in claim 12, a hearing aid.

The invention will now be described more fully below with reference to the drawing, in which

figure 1 shows a device according to a first embodiment of the invention,

figure 2 shows a sketch of a frequency spectrum for the device of figure 1,

figure 3 shows a device according to a second embodiment of the invention,

figure 4 shows a sketch of a frequency spectrum for the device of figure 3, and

figure 5 shows a sketch of an alternative frequency spec^¬ trum for the device of figure 3.

Figure 1 shows a device according to a first embodiment of the invention. An electronic signal S_ιn representing an original sound signal is taken to the input of an in^¬ put circuit 1. The signal S_ιr, may be in analog or digital form. By way of example, a signal m analog form may be the output from a microphone, while a signal m digital form may represent received speech information m a cellular telephone. The circuit of figure 1 comprises a digital signal processor, and m the case of an analog input signal, the input circuit 1 thus includes an A/D- converter .

When the switch 3 is m the open position, the digital signal S from the circuit 1 is connected through the summation point 2 to a D/A-converter 6 m which an analog signal corresponding to the original sound signal is regenerated. The output of D/A-converter 6 is connected to a loudspeaker or another sound reproducing unit 7, either directly or through an amplifier (not shown) .

The lower part of figure 1 shows the circuit elements used to improve the intelligibility of the sound signal m case this is needed, either because of noisy environ- ments or because of the hearing deficiency of the person listening to the loudspeaker 7. Through the switch 3 the digital signal from the circuit 1 is connected to a pitch transposing circuit 4, the implementation of which will be described below. The pitch transposing circuit 4 transposes the frequency spectrum of the sound signal to a frequency range different from that of the original sound signal. The result is a new sound signal Si containing the same speech information as the original one, but sounding as if it was pronounced by a different per- son. It should be noted that ^λ sound signal" is here taken to mean also the digital signal representing the sound signal .

In situations where important parts of the speech mfor- mation of the original sound signal S_c are located in a frequency band which is not easily understood, it is known in the art to replace the original signal with the transposed signal Si. The frequency spectrum of the signal Si is then selected in a frequency range which is understood more easily. Although this may well improve the intelligibility of the speech, it is a drawback that the natural sound of the speaker is also changed, so that it will be difficult to recognize the voice of a specific person. It might, so to say, sound like a different person.

As will be seen in figure 1, the solution according to the invention is different. Here, the new signal Si is combined with the original signal S,, at the summation point 2, such that a combined signal S+ is generated. Thus, the user listening to the loudspeaker 7 will hear the combination of the two voices simultaneously. The original signal component will ensure that the speaker can be recognized, while the transposed signal component will improve the intelligibility. In order to further im- prove the recognizability, an amplifier 5 can be inserted between the pitch transposing circuit 4 and the summation point 2. The gain q₁ of the amplifier 5 can be adjusted by means of a control signal ai, and it will normally be adjusted to a value less than one. In this way, the original voice will have the highest level, while the transposed voice will be heard in the background.

Figure 2 shows a sketch of how the frequency spectrum of the combined signal S₊ could look. The part 14 of the spectrum corresponds to the spectrum of the original signal Sn, while the part 15 corresponds to the transposed signal S_:. In this case the part 15 corresponding to the transposed signal S_α is shown at frequencies lower than those of the original signal. This situation is helpful to e.g. elderly people whose sensitivity at higher frequencies is decreased. In cases where the hearing problem is due to noise in a specific frequency band or to poor perception of sounds located within a relatively narrow frequency band, the part corresponding to the transposed signal S_x could alternatively be located at frequencies higher than those of the original signal.

Preferably, the frequency transposition should be loga- rithmic (pitch scaling) instead of linear (true pitch shift) . Small logarithmic frequency transpositions (a few percent) in speech are hardly noticed, i.e. both the intelligibility and the quality of the speech remain very high. When the transposition is increased the natural sound of the speech will change, as described above, while the intelligibility will remain high. Greater variations (about 50%) causes deterioration of speech quality as well as intelligibility and should thus be avoided. On the other hand, the human perception is quite intolerant of linear frequency shifts. A linear transposition of the frequency spectrum by even a few hertz (e.g. by heterodyning) is fatal to music, and linear shifts of the order of 100 Hz will cause marked deterioration also in speech intelligibility.

Several methods of implementing the pitch transposing circuit 4 are known in the art. In pitch transposition or pitch scaling the pitch of a signal is changed without changing its length. In practical applications, however, this is often achieved by changing the length of a sound and then performing a sample rate conversion to change the pitch. The different methods will not all perform equally well on all different types of signals. The good methods will provide up to a one-octave pitch scaling, corresponding to a 200% time scaling, with no audible loss in quality. One applicable method is the technique of Time Domain Harmonic Scaling (TDHS) . Since this method is well disclosed in the literature, it will only be described briefly here. In one implementation of this technique, the Short Time Autocorrelation of the signal is taken and the fundamental frequency is found by picking the maximum. The timebase is changed by copying the input to the output in an overlap-and add manner, while simultaneously incrementing the input pointer by the overlap-size minus a multiple of the fundamental frequency. This results in the input being traversed at a different speed than the original data was recorded at, while aligning to the estimated basic period.

Figure 3 shows an improved embodiment of the invention in which several transposed signals Si, S₂, ..., S_r, are generated, differing only in the amount of transposition. Each of these signals is generated in a pitch transposing circuit 4, 9, 12 of the same type as the one described in figure 1. Similarly, each signal can be amplified individually in the amplifiers 5, 10, 13 and switched individually in or out by the switches 3, 8, 11. The combined signal S₊ is now the sum of S₀ and all the transposed signals Si, S₂, ..., S_n, and the sound reproduced in the loudspeaker 7 will be the original sound in combination with a synthetic chorus of a number of different voices, each corresponding to one of the transposed signals Si, S:, ..., S_rι. Again, the gain of the amplifiers 5, 10, 13 will normally be adjusted to a value less than one, such that the original voice will have the highest level while the synthetic chorus will be heard in the background. In this way it is still possible to recognize the original speaker. All the amplifiers 5, 10, 13 may be adjusted to the same gain, or they may be adjusted individually ac- cording to e.g. the hearing characteristics of a specific person.

Figure 4 shows a sketch of how the frequency spectrum of the combined signal S₊ could look in this situation. Again, the part 14 of the spectrum corresponds to the spectrum of the original signal S , while the parts 16 and 17 correspond to the transposed signals Si, S₂, ..., S_n. Some of the transposed signals 16 are transposed to frequencies lower than the frequency of the original signal, while other signals 17 are transposed to higher frequencies. This arrangement will be advantageous in cases where the hearing problem is due to noise in a specific frequency band or to poor perception of sounds located within a relatively narrow frequency band.

Figure 5 shows how the spectrum could look when all the transposed signals 18 are located at frequencies lower than the frequency of the original signal. This situation is helpful to e.g. elderly people whose sensitivity at higher frequencies is decreased.

Although a preferred embodiment of the present invention has been described and shown, the invention is not re- stricted to it, but may also be embodied in other ways within the scope of the subject-matter defined in the following claims.

Claims

P a t e t C l a i m s

1. A method of improving the intelligibility of a first sound signal (So) having a certain frequency spectrum, and wherein at least one additional sound signal (Si,- Si, S:, ..., S_π) is generated comprising a transposition of the spectrum of the first sound signal in the frequency range, c h a r a c t e r i z e d in that a combined signal (S+) is generated comprising the first sound signal (S₀) as well as the at least one additional sound signal (Si; Si, S , ..., S_n) , said combined signal being reproduced to a user.

2. A method according to claim 1, c h a r a c t e r i z e d in that several additional sound signals (Si, S , ..., S_rι) are generated and comprised in the combined signal (S₀) .

3. A method according to claim 2, c h a r a c t e r i z e d in that at least one of the additional signals (15; 16; 18) is generated at frequencies below those of the first signal, and at least one other of the addi- tional signals (17) at frequencies above those of the first signal.

4. A method according to claims 1-3, c h a r a c t e r i z e d in that the combined signal (S+) is gener- ated such that the amplitude of the first sound signal (S₀; 14) is higher than the amplitude of the at least one additional sound signal (Si, S₂, ..., S_n; 15; 16, 17; 18).

5. A device comprising means for reproducing a sound sig- nal to a user, said means comprising • means (1) for receiving a first sound signal (So) having a certain frequency spectrum, and

• means (4; 4. 9, 12) for generating at least one additional sound signal (Si; Si, S₂, ..., S_rι) compris- ing a transposition of the spectrum of the first sound signal in the frequency range, c h a r a c t e r i z e d in that it further comprises means (2) for generating a combined signal (S+) comprising the first sound signal (S₀) as well as the at least one additional sound signal (Si; Si, S₂, ..., S_n) , and means (6, 7) for reproducing the com^¬ bined signal to the user.

6. A device according to claim 5, c h a r a c t e r - i z e d in that it is adapted to generate several additional sound signals (Si, S₂, ..., S_rι) and to generate the combined signal (S+) such that it comprises each of these additional signals.

7. A device according to claim 6, c h a r a c t e r i z e d in that it is adapted to generate at least one of the additional signals (15; 16; 18) at frequencies be^¬ low those of the first signal, and at least one other of the additional signals (17) at frequencies above those of the first signal.

8. A device according to claim 6 or 7, c h a r a c ^¬ t e r i z e d in that it is adapted to generate the com^¬ bined signal (S₊) such that the amplitude of the first sound signal (S₀; 14) is higher than the amplitudes of the additional sound signals (Si, S₂, ..., S_n; 15; 16, 17; 18) .

9. A device according to claim 8, c h a r a c t e r - i z e d in that it comprises means (5; 5, 10, 13) for regulating the amplitude of each of the additional sound signals individually.

10. A device according to claims 5-9, c h a r a c - t e r i z e d in that it comprises means (3; 3, 8, 11) for disabling the means for generating the combined signal .

11. A device according to claims 5-10, c h a r a c - t e r i z e d in that the device is a cellular telephone .

12. A device according to claims 5-10, c h a r a c t e r i z e d in that the device is a hearing aid.