US9812144B2 - Speech transcoding in packet networks - Google Patents

Speech transcoding in packet networks Download PDF

Info

Publication number
US9812144B2
US9812144B2 US14/786,779 US201314786779A US9812144B2 US 9812144 B2 US9812144 B2 US 9812144B2 US 201314786779 A US201314786779 A US 201314786779A US 9812144 B2 US9812144 B2 US 9812144B2
Authority
US
United States
Prior art keywords
packet
received
encoder
decoder
packets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US14/786,779
Other versions
US20160078876A1 (en
Inventor
Olli Sakari Kirla
Antti Pekka Einari KURITTU
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Solutions and Networks Oy
Original Assignee
Nokia Solutions and Networks Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Solutions and Networks Oy filed Critical Nokia Solutions and Networks Oy
Assigned to NOKIA SOLUTIONS AND NETWORKS OY reassignment NOKIA SOLUTIONS AND NETWORKS OY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KIRLA, OLLI SAKARI, KURITTU, Antti Pekka Einari
Publication of US20160078876A1 publication Critical patent/US20160078876A1/en
Application granted granted Critical
Publication of US9812144B2 publication Critical patent/US9812144B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/26Flow control; Congestion control using explicit feedback to the source, e.g. choke packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • H04L47/283Flow control; Congestion control in relation to timing considerations in response to processing delays, e.g. caused by jitter or round trip time [RTT]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/18Service support devices; Network management devices
    • H04W88/181Transcoding devices; Rate adaptation devices

Definitions

  • Speech transcoding in packet networks may be useful when both incoming and outgoing speech streams of the transcoding entity are packet based.
  • This can be any transcoding entity having packet interfaces, such as, but not limited to A-interface over IP (AoIP) in global system for mobile communication (GSM), Iu in third generation (3G), Mb in voice over internet protocol (VoIP) or long term evolution (LTE), multimedia resource function (MRF) entity in the internet protocol (IP) cloud, or the like.
  • A-interface over IP A-interface over IP
  • GSM global system for mobile communication
  • Iu Iu in third generation
  • Mb voice over internet protocol
  • LTE long term evolution
  • MRF multimedia resource function
  • FIG. 1 illustrates transcoding with jitter buffering. This approach may be used, for example, in a Media Gateway (MGW). However, this approach may result in increased latency of the connection.
  • MGW Media Gateway
  • Encoder 1 105 may send packets 1 , 2 , and 3 , but packet 2 may be lost.
  • a transcoder 110 may use a jitter buffer 120 and first decoder/bad frame handler (BFH) 130 .
  • the jitter buffer 120 may forward the packets and a packet loss indication to the first decoder/BFH 130 .
  • the decoder 1 130 can use bad frame handling to conceal the lost packet. Packet loss concealment is a synonym for bad frame handling.
  • the transcoder 110 may interpolate packet 2 via bad frame handling in decoder 1 130 and encode the packets using Second encoder 140 . The packets can then be sent to and received by jitter buffer second decoder 150 .
  • Another possible solution is the scheduling of transcoding immediately when a packet has been received. This is shown in FIG. 2 .
  • the running of transcoding stage is based on received packet time instant, rather than timer based instant. Handling lost and out of order packets may not be easy in this case. If one or more packets are lost, the transcoding stage must be run multiple times at the same time when the next valid speech packet is received.
  • the bad frame handler, or packet loss concealment, of the decoder 130 interpolates missing packets before second encoder 140 , which is the encoder of the transcoding stage. This will generate a huge peak in the processing load of the transcoding stage. Increased jitter may also exist in the outgoing packet stream when encoded interpolated and the next valid received packets are clustered.
  • out-of-order packet the out-of-order packet will be discarded as the latter packet has already been processed by the transcoding stage.
  • jitter buffer the order of packets can be rearranged before the transcoding, and no packets will be lost.
  • a method includes receiving a packet at a packet loss detector of a transcoder and omitting jitter buffering before decoding in the transcoder and omitting bad frame handling in a decoding stage of the transcoder.
  • a decoder of the transcoder decodes the packet into a decoded packet, and the decoded packet is encoded into a re-encoded packet by an encoder of the transcoder.
  • the method also includes transmitting the re-encoded packet from the transcoder. Further, the method includes monitoring for a received packet. When a packet is not received, the method additionally includes freezing the decoder and the encoder.
  • the method also includes sending packet loss information from the decoder to the encoder as side information when the packet is not received.
  • the method includes setting an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
  • an apparatus in certain embodiments, includes a packet loss detector configured to receive a packet at a packet loss detector.
  • the apparatus is configured to omit jitter buffering before decoding and to omit bad frame handling in a decoding stage.
  • the apparatus also includes a decoder configured to decode the packet into a decoded packet, and an encoder configured to encode the decoded packet into a re-encoded packet.
  • the apparatus additionally includes a transmitter configured to transmit the re-encoded packet from the transcoder.
  • the packet loss detector is further configured to monitor for a received packet and to freeze the decoder and the encoder when a packet is not received.
  • the decoder is configured to send packet loss information to the encoder as side information when the packet is not received.
  • the encoder is configured to set an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
  • An apparatus includes receiving means for receiving a packet.
  • the apparatus is configured to omit jitter buffering before decoding and to omit bad frame handling in a decoding stage.
  • the apparatus further includes decoding means for decoding the packet into a decoded packet and encoding means for encoding the decoded packet into a re-encoded packet.
  • the apparatus also includes transmitting means for transmitting the re-encoded packet.
  • the apparatus includes monitoring means for monitoring for a received packet.
  • the apparatus additionally includes freezing means for freezing the decoder and the encoder when a packet is not received, and sending means for sending packet loss information from the decoder to the encoder as side information when the packet is not received.
  • the apparatus further includes setting means for setting an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
  • a non-transitory computer readable medium in certain embodiments, is encoded with instructions that, when executed in hardware, performs a process.
  • the process includes receiving a packet at a packet loss detector of a transcoder.
  • the process includes omitting jitter buffering before decoding in the transcoder and omitting bad frame handling in a decoding stage of the transcoder.
  • the process includes decoding the packet into a decoded packet by a decoder of the transcoder and encoding the decoded packet into a re-encoded packet by an encoder of the transcoder.
  • the process also includes transmitting the re-encoded packet from the transcoder.
  • the process includes monitoring for a received packet and freezing the decoder and the encoder when a packet is not received.
  • the process also includes sending packet loss information from the decoder to the encoder as side information when the packet is not received.
  • the process includes setting an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
  • FIG. 1 illustrates a conventional transcoding with jitter buffering.
  • FIG. 2 illustrates a conventional transcoding without jitter buffering.
  • FIG. 3 illustrates a first embodiment providing transcoding without jitter buffering and bad frame handling.
  • FIG. 4 illustrates a second embodiment providing enhancement for lookahead in a second encoder.
  • FIG. 5 a illustrates signal waveform without lookahead alignment.
  • FIG. 5 b illustrates signal waveform for the second embodiment.
  • FIG. 6 illustrates MOS (P.862.1) for jitter buffer/bad frame handler, with and without lookahead alignment, according to certain embodiments.
  • FIG. 7 illustrates MOS (P.862.1) for jitter buffer/bad frame handler, the second embodiment and a third embodiment.
  • FIG. 8 illustrates a flow-chart for the third embodiment.
  • FIG. 9 illustrates a DTX problem of hangover period frames lost before SID frames.
  • FIG. 10 illustrates a DTX problem of comfort noise with wrong hangover period frames.
  • FIG. 11 illustrates a DTX problem of audible click heard by end user.
  • Certain embodiments may avoid the latency issues found in conventional approaches by, for example, omitting the jitter buffering before the transcoding stage. Furthermore certain embodiments can avoid a peak processing load problem and packet clustering in the outgoing stream for received lost packets by freezing decoder and encoder running for the time period when packets are not received. As soon as a next valid packet is received and the receiver entity notices one or more lost packets, the amount of lost packets can be indicated to the encoder entity together with the valid decoded packet. Then the encoder can be run again using the valid decoded packet. For this encoded outgoing packet, the gap due to packet loss in the incoming stream can be indicated for the peer decoder by incrementing the RTP timestamp according to the gap. In this way, the bad frame handler of the peer decoder can interpolate the missing packets and reasonable voice quality can be maintained.
  • the above-mentioned approach can be enhanced when a lookahead is used in the encoder stage.
  • the voice quality can be improved when decoded received speech signal is aligned according to the lookahead before the encoder stage.
  • voice quality can be enhanced by running decoder/bad frame handler and encoder once after the first lost packet.
  • This enhancement may diminish quality effects due to encoder-decoder synch loss of the first codec pair.
  • the result may be close to a quality level of the transcoding with conventional bad frame handling. This will limit the peak processing load twice to the nominal load, which should be acceptable for the most of applications.
  • certain embodiments provide a way to handle possible clicks with the discontinuous transmission (DTX) function of adaptive multi-rate (AMR) and adaptive multi-rate wideband (AMR-WB) codecs by substituting speech data including clicks with appropriate data.
  • DTX discontinuous transmission
  • jitter buffer is applied to an incoming speech stream before the transcoding stage.
  • a result of the jitter buffer is that it increases the latency of the connection.
  • the jitter buffer may be mandatory for packet to circuit switched (CS) network interworking, as it removes jitter in packet stream before sending voice signal to the CS network.
  • CS packet to circuit switched
  • IP-to-IP interworking it may not be necessary to remove jitter, as the receiving CS gateway or IP terminal can equalize the packet stream before speech decoding and voice sample play-out.
  • Certain embodiments in the present disclosure avoid latency by omitting the jitter buffering before the transcoding stage. Certain embodiments also prevent peak processing load and packet clustering in the outgoing stream for received lost packets by freezing decoder and encoder running for the time period when packets are not received.
  • a first embodiment is shown in FIG. 3 .
  • the bad frame handling stage within first decoder 130 can be omitted (contrary to the Related Art shown in FIGS. 1 and 2 ).
  • a packet loss indication block 125 informs the second encoder 140 about missing packets.
  • the first decoder 130 can be frozen such that no decoded voice packets are sent internally between first decoder 130 and second encoder 140 .
  • the running of second encoder 140 can be frozen during this period, such that no packets are sent towards the peer/second decoder 150 .
  • the number of lost packets can be sent as a side-information to the second encoder 140 .
  • the second encoder 140 can be run again but the number of missing packets can be informed in this packet by incrementing the RTP time stamp by the amount of sampling clock ticks corresponding to the time of lost packets plus one packet.
  • the RTP sequence number may be incremented by the number of missing packets plus one.
  • the RTP sequence number can be incremented by one.
  • the jump in the timestamp can indicate to the peer jitter buffer/decoder/bad frame handler 150 that a certain amount of packets are missing and the decoder can be run with the bad frame handler multiple times in order to interpolate the missing voice packets. This can result in a constant decoded voice stream that could be played out to the CS network in case of the IP-to-CS gateway, or to D/A converter in the of case the IP terminal.
  • Benefits of the above-mentioned embodiment are that the latency due to conventional jitter buffering and a peak in the processing load can be avoided. Received packets are instantly forwarded to the loss indication and decoder blocks and then to the encoder block. Missing packets do not cause the generation of any packets between the first decoder 130 and second encoder 140 within the transcoder. This prevents a peak in the processing load once the next valid packet can be received after the lost packets, because the first decoder 130 and second encoder 140 are not run for the missing packets. Furthermore this prevents packet clustering of the conventional approach in which missing packets are interpolated within the transcoder before encoding, as illustrated in FIG. 2 .
  • a loss of quality due to misalignment in the signal phase between received and sent packet streams can be compensated when the second encoder 140 uses a lookahead functionality.
  • Lookahead can be used in the windowing of the linear prediction block of an encoder, and is used in many low-bitrate codecs, such AMR, AMR-WB and G.729.
  • a quality loss can arise when one or more successive packets are lost in the receiving packet stream. These lost packets can be reflected as missing packets in the outgoing packet stream with slightly different signal phase compared to incoming packets. Specifically, the signal phase can be delayed by the amount of lookahead, which is typically 5 ms. This phase difference can cause additional disturbances at the second decoder 150 output when a packet loss occurs.
  • FIG. 5 a illustrates this phenomenon for a triangle waveform of 50 Hz. In this example the most common packet size of 20 ms has been used which is also the typical codec frame size for low-bitrate speech codecs.
  • the encoder adds 5 ms of zero signal before encoding the actual signal. This effectively increases delay by 5 ms at the second encoder 140 output.
  • a typical sub-frame size of 5 ms is utilized here for the internal decoded packet size/interval.
  • This can be used by an MGW.
  • G.711 encoded 5 ms packets can be used by the MGW, but these could be 5 ms packets having linear pulse code modulation (PCM) samples as well.
  • PCM linear pulse code modulation
  • a 15 ms additional delay can be generated because the second encoder 140 cannot be run until the fourth sub-frame has been received. This can be due to timer based scheduling currently used by the MGW. This drawback could be enhanced, however, by sending internal packets as a cluster of four sub-frames.
  • FIGS. 4 and 5 b A solution for the lookahead alignment problem is shown in FIGS. 4 and 5 b .
  • the second encoder 140 is utilizing a lookahead, the first sub-frame from the first decoder 130 can be dropped at the initialization phase of the transcoder.
  • the first outgoing encoded packet will be generated from sub-frames 2 to 5. This will delay the sending of the first encoded and following packets by 5 ms compared to the non-alignment case.
  • the actual signal may not be delayed compared to the non-alignment case, and the delay may not be increased from the end user point of view.
  • the second encoder 140 is not using the lookahead functionality, the dropping of the first sub-frame may not be applied.
  • the second packet In the case of 20 ms packets used internally by the transcoder, the second packet must be awaited from the first decoder 130 until sub-frames 2 to 5 can be given to the second encoder 140 . This can generate a delay of 20 ms, which is actually equal to the previous case in which 5 ms internal packets are used. Thus, certain embodiments can be used both for 5 ms and 20 ms internal packet sizes having the same delay. When the second encoder 140 is not using the lookahead, the 20 ms additional delay can be avoided.
  • FIG. 6 A benefit of the lookahead alignment can be seen from FIG. 6 .
  • Simulations with ITU-T P.862.1 objective MOS tool and AMR12.2 kbps to AMR12.2 kbps transcoding scenario show an improvement of about 0.1 mean opinion score (MOS) compared to the scenario without lookahead alignment.
  • MOS mean opinion score
  • it does not reach the quality level of the reference scenario, which is the conventional jitter buffering with bad frame handling.
  • an enhancement to the first and second embodiments can be used, as described in a third embodiment.
  • voice quality of the reference scenario (jitter buffering and bad frame handling) is to be achieved.
  • disturbances appear after first decoder 130 if speech frames are dropped without any bad frame handling. This may be because synchronization is lost between the first encoder 105 and the first decoder 130 .
  • the bad frame handler can be run once a certain time has passed since the last received packet. Thus, for packet losses of a longer period of time, the bad frame handler can also be run once after the last valid packet. For losses of more than one packet, the bad frame handler may be run twice or more times. This may further enhance voice quality.
  • Some limitation in processing load may also be necessary when, due to an excessive jitter, a cluster of packets is received from first encoder 105 .
  • the jitter buffer can handle these kinds of occasions by equalizing reasonably clustered packets while excessively delayed packets are discarded.
  • excessively clustered packets may induce a high peak load into the system.
  • a peak load limitation can solve this problem.
  • processing of clustered packets can be delayed or some packets can be dropped so that the desired load level is achieved.
  • Out-of-order packets can also be handled in a sensible manner by the present system.
  • a later packet once received, can be decoded and sent towards the second encoder 140 .
  • This can lead to a situation in which a first packet must be discarded if it is received after a later packet.
  • This can effectively increase the frame erasure ratio.
  • the delay can be minimized because the first sequence number packet has not been awaited.
  • the first sequence number packet is treated as an excessively delayed packet and is discarded.
  • Out-of-order packets are very rare in real networks, so this kind of handling may have a small real impact.
  • a flow chart for the third embodiment is shown in FIG. 8 .
  • the sequence shown in the flow chart can be run at an appropriate polling rate, which can be less than a packet interval.
  • the first decision block 810 detects whether a packet has been received. If it has not been received, first decoder 130 and second encoder 140 are kept in frozen state at 820 .
  • a threshold of the excessive delay is analyzed whether a threshold of the excessive delay has been exceeded at 830 . If the delay has been exceeded, the packet can be discarded at 840 .
  • This threshold may be useful to avoid generating an excessive jitter to the peer decoder, and this threshold may be analogous to a buffering level of a conventional jitter buffer. Late packets exceeding the buffering level can be discarded by the jitter buffer.
  • a possible loss of one or more packets can be detected at 850 by, for example, analyzing RTP sequence number and timestamp of the received packet. If there is a gap in both sequence number and timestamp compared to the previously received packet, a packet loss can be determined to be present. The number of packets that are missing can also be determined, at 855 .
  • the previously received valid packet can be decoded at 862 with the bad frame handler and the decoded frame can be sent to the second encoder 140 over the internal interface. Then the interpolated packet can be encoded, at 864 , by the second encoder 140 , and the timestamp (and optionally the sequence number) of the outgoing encoded packet can be incremented by one at 866 .
  • one timestamp unit here can refer to an increment of RTP sampling clock ticks of packet interval. For example, 160 ticks for an 8000 Hz sample rate and a 20 ms packet interval.
  • the processing load level of the transcoder can be analyzed with respect to packet clustering 870 and buffering 875 . If both packet clustering and buffering limit levels are not exceeded, the current packet can be decoded at 880 and forwarded to the second encoder 140 .
  • the buffering limit can be a maximum number of buffered packets. Then the decoded packet is encoded at 882 and timestamp and sequence number are incremented by one for the outgoing packet at 884 .
  • the running of the first decoder 130 and second encoder 140 can be delayed at 877 , such that the processing peak load is kept within an allowed limit. If the buffering limit is exceeded, the current packet is dropped. Alternatively one of the previously received packets could be dropped from the buffer.
  • the current valid packet can be decoded at 861 and sent to the second encoder 140 together with an indication of the number of lost packets.
  • the outgoing packet can be encoded at 863 and the timestamp can be incremented by the number of lost packets plus one at 865 . This case may be equivalent to the first embodiment, discussed above.
  • the correctness of the sequence number can be studied by verifying that the current sequence number is higher by one than the previously received, at 890 . If this is true, the clustering and buffering limits can be analyzed as discussed above, via 870 and 875 , if appropriate. If the received packet has been received out of order, it can be dropped at 840 .
  • the third embodiment can also include the quality enhancement provided by the second embodiment.
  • the architecture of the first, second, and third embodiments may cause an audible click to transfer into the encoded audio after second encoder 140 when Discontinuous Transmission (DTX) is used and DTX hangover period frames are lost before first decoder 130 .
  • the audible click may be caused by the following.
  • the decoder can generate comfort noise when SID frames are received.
  • comfort noise can be generated based on hangover period speech frames, which can be, for example, 7 previous speech frames. If some of the hangover period frames are lost, then the comfort noise parameters can be calculated from frames that can contain high energy speech as illustrated in FIG. 9 . The result of such calculation can be audible clicks in the decoded signal, as illustrated in FIG. 10 .
  • the audible click from first decoder 130 can be encoded by the second encoder 140 .
  • the jump in the timestamp can be indicated to the peer jitter buffer/decoder/bad frame handler 150 of the missing packets. Then, the audible click can be heard after second decoder 150 by the end user as illustrated in FIG. 11 .
  • the system can set PCM samples to zero for the duration of SID_FIRST+first SID_UPDATE comfort noise period, after second SID_UPDATE has been received resume normal operation
  • first SID_FIRST & SID_UPDATE can be replaced with homing frames. After second SID_UPDATE has been received normal operation can be resumed.
  • the “frames elapsed since the last SID frame” counter can be set to zero in first decoder 130 , resulting in the previous SID update being used for comfort noise calculation.
  • 3GPP technical specification (TS) 26.092 which is hereby incorporated herein by reference in its entirety, explains: “The decoder counts the number of frames elapsed since the last SID frame was updated and passed to the RSS by the encoder. Based on this count, the decoder determines whether or not there is a hangover period at the end of the speech burst. The interpolation factor is also adapted to the SID update rate. As soon as a SID frame is received comfort noise is generated at the decoder end. The first SID frame parameters are not received but computed from the parameters stored during the hangover period. If no hangover period is detected, the parameters from the previous SID update are used.”
  • frames can be buffered from previous hangover periods and the system can use those frames to generate comfort noise, thus avoiding the use of non-hangover period speech frames.
  • the system can substitute the frames containing the audible click after first decoder 130 on second encoder 140 side with no_data frames.
  • the system can model background noise level and spectrum from previous speech pauses and replace the frames containing the audible click after first decoder 130 with synthesized comfort noise.
  • Certain embodiments may have various advantages. For example, certain embodiments may avoid the latency issues found in conventional jitter buffering, as received packets may be instantly forwarded to the loss indication and decoder blocks and then to the encoder block. These embodiments may also prevent packet clustering, because the missing packets are not interpolated within the transcoder before encoding.
  • an apparatus which may be a transcoder, can include a packet loss detector configured to receive a packet at the packet loss detector.
  • the apparatus can be configured to omit jitter buffering before decoding and to omit bad frame handling in a decoding stage.
  • the packet loss detector can be any suitable device, and may include one or more controller, processor, memory, or combination thereof.
  • the packet loss detector can, for example, correspond to packet loss indication block 125 .
  • the apparatus can also include a decoder configured to decode the packet into a decoded packet.
  • the decoder can correspond to first decoder 130 .
  • the apparatus can include an encoder configured to encode the decoded packet into a re-encoded packet.
  • the encoder can correspond to second encoder 140 .
  • the apparatus can additionally include a transmitter configured to transmit the re-encoded packet from the transcoder.
  • the transmitter can be variously embodied.
  • the transmitter can be a network interface card, a port, a wireless modem, or any other suitable communication hardware.
  • the packet loss detector can further be configured to monitor for a received packet and freeze the decoder and the encoder when a packet is not received, as illustrated at 810 and 820 in FIG. 8 .
  • the packet loss detector can be configured to determine whether a received packet is excessively delayed and drop the received packet when the received packet is excessively delayed, as illustrated at 830 and 840 in FIG. 8 .
  • the packet loss detector can further be configured to determine whether a received packet is out of order and drop the received packet when the received packet is out of order, as illustrated at 890 and 840 in FIG. 8 .
  • the packet loss detector can be configured to determine whether a clustering limit is exceeded, determine whether a buffering limit is exceeded when the clustering limit is exceeded, and delay processing a received packet when the buffering limit is not exceeded, as illustrated at 870 , 875 , and 877 in FIG. 8 .
  • the packet loss detector can further be configured to determine whether a buffering limit is exceeded and drop a received packet when the buffering limit is exceeded, as illustrated at 875 and 840 in FIG. 8 .
  • the packet loss detector can be configured to determine whether only a single packet has been lost and interpolate an interpolated packet for the single packet when only the single packet has been lost, as illustrated at 855 and 864 in FIG. 8 .
  • the packet loss detector can further be configured to signal an amount of lost packets to the encoder when a valid packet is received after at least one packet is lost or dropped, as illustrated in FIGS. 3 and 4 .
  • the packet loss detector can be configured to provide a lookahead at the encoder and align a received speech signal according to the lookahead before the received speech signal is provided to the encoder.
  • the packet loss detector can further be configured to determine that comfort noise is to be applied during a period of a lost packet and control the comfort noise to avoid an audible click. This can be accomplished using any of the six implementation examples described above, or by any other way.

Abstract

Speech transcoding in packet networks may be useful when both incoming and outgoing speech streams of the transcoding entity are packet based. This can be any transcoding entity having packet interfaces. A method can include omitting jitter buffering before decoding in a transcoder and omitting bad frame handling in a decoding stage of a transcoder. The method can also include freezing a decoder and the encoder when a packet is not received. The method can also include sending packet loss information from the decoder to the encoder as side information when the packet is not received. The method can further include setting an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.

Description

BACKGROUND
Field
Speech transcoding in packet networks may be useful when both incoming and outgoing speech streams of the transcoding entity are packet based. This can be any transcoding entity having packet interfaces, such as, but not limited to A-interface over IP (AoIP) in global system for mobile communication (GSM), Iu in third generation (3G), Mb in voice over internet protocol (VoIP) or long term evolution (LTE), multimedia resource function (MRF) entity in the internet protocol (IP) cloud, or the like.
Description of the Related Art
Conventionally, jitter buffering used before transcoding is also used for IP-to-IP connections, like IP-to-circuit switched (CS) connections. The term de-jitter buffering can also refer to the same thing as the term jitter buffering. FIG. 1 illustrates transcoding with jitter buffering. This approach may be used, for example, in a Media Gateway (MGW). However, this approach may result in increased latency of the connection.
As shown in FIG. 1, Encoder1 105 may send packets 1, 2, and 3, but packet 2 may be lost. A transcoder 110 may use a jitter buffer 120 and first decoder/bad frame handler (BFH) 130. The jitter buffer 120 may forward the packets and a packet loss indication to the first decoder/BFH 130. The decoder1 130 can use bad frame handling to conceal the lost packet. Packet loss concealment is a synonym for bad frame handling. The transcoder 110 may interpolate packet 2 via bad frame handling in decoder1 130 and encode the packets using Second encoder 140. The packets can then be sent to and received by jitter buffer second decoder 150.
Another possible solution is the scheduling of transcoding immediately when a packet has been received. This is shown in FIG. 2. In this solution, the running of transcoding stage is based on received packet time instant, rather than timer based instant. Handling lost and out of order packets may not be easy in this case. If one or more packets are lost, the transcoding stage must be run multiple times at the same time when the next valid speech packet is received. One reason is that the bad frame handler, or packet loss concealment, of the decoder 130 interpolates missing packets before second encoder 140, which is the encoder of the transcoding stage. This will generate a huge peak in the processing load of the transcoding stage. Increased jitter may also exist in the outgoing packet stream when encoded interpolated and the next valid received packets are clustered.
In the case of out-of-order packet, the out-of-order packet will be discarded as the latter packet has already been processed by the transcoding stage. On the other hand if jitter buffer is applied, the order of packets can be rearranged before the transcoding, and no packets will be lost.
SUMMARY
According to certain embodiments, a method includes receiving a packet at a packet loss detector of a transcoder and omitting jitter buffering before decoding in the transcoder and omitting bad frame handling in a decoding stage of the transcoder. A decoder of the transcoder decodes the packet into a decoded packet, and the decoded packet is encoded into a re-encoded packet by an encoder of the transcoder. The method also includes transmitting the re-encoded packet from the transcoder. Further, the method includes monitoring for a received packet. When a packet is not received, the method additionally includes freezing the decoder and the encoder. The method also includes sending packet loss information from the decoder to the encoder as side information when the packet is not received. In addition, the method includes setting an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
In certain embodiments, an apparatus includes a packet loss detector configured to receive a packet at a packet loss detector. The apparatus is configured to omit jitter buffering before decoding and to omit bad frame handling in a decoding stage. The apparatus also includes a decoder configured to decode the packet into a decoded packet, and an encoder configured to encode the decoded packet into a re-encoded packet. The apparatus additionally includes a transmitter configured to transmit the re-encoded packet from the transcoder. The packet loss detector is further configured to monitor for a received packet and to freeze the decoder and the encoder when a packet is not received. The decoder is configured to send packet loss information to the encoder as side information when the packet is not received. The encoder is configured to set an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
An apparatus, according to certain embodiments, includes receiving means for receiving a packet. The apparatus is configured to omit jitter buffering before decoding and to omit bad frame handling in a decoding stage. The apparatus further includes decoding means for decoding the packet into a decoded packet and encoding means for encoding the decoded packet into a re-encoded packet. The apparatus also includes transmitting means for transmitting the re-encoded packet. Further, the apparatus includes monitoring means for monitoring for a received packet. The apparatus additionally includes freezing means for freezing the decoder and the encoder when a packet is not received, and sending means for sending packet loss information from the decoder to the encoder as side information when the packet is not received. The apparatus further includes setting means for setting an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
A non-transitory computer readable medium, in certain embodiments, is encoded with instructions that, when executed in hardware, performs a process. The process includes receiving a packet at a packet loss detector of a transcoder. In addition, the process includes omitting jitter buffering before decoding in the transcoder and omitting bad frame handling in a decoding stage of the transcoder. Further, the process includes decoding the packet into a decoded packet by a decoder of the transcoder and encoding the decoded packet into a re-encoded packet by an encoder of the transcoder. The process also includes transmitting the re-encoded packet from the transcoder. Further, the process includes monitoring for a received packet and freezing the decoder and the encoder when a packet is not received. The process also includes sending packet loss information from the decoder to the encoder as side information when the packet is not received. In addition, the process includes setting an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
BRIEF DESCRIPTION OF THE DRAWINGS
For proper understanding of the invention, reference should be made to the accompanying drawings, wherein:
FIG. 1 illustrates a conventional transcoding with jitter buffering.
FIG. 2 illustrates a conventional transcoding without jitter buffering.
FIG. 3 illustrates a first embodiment providing transcoding without jitter buffering and bad frame handling.
FIG. 4 illustrates a second embodiment providing enhancement for lookahead in a second encoder.
FIG. 5a illustrates signal waveform without lookahead alignment.
FIG. 5b illustrates signal waveform for the second embodiment.
FIG. 6 illustrates MOS (P.862.1) for jitter buffer/bad frame handler, with and without lookahead alignment, according to certain embodiments.
FIG. 7 illustrates MOS (P.862.1) for jitter buffer/bad frame handler, the second embodiment and a third embodiment.
FIG. 8 illustrates a flow-chart for the third embodiment.
FIG. 9 illustrates a DTX problem of hangover period frames lost before SID frames.
FIG. 10 illustrates a DTX problem of comfort noise with wrong hangover period frames.
FIG. 11 illustrates a DTX problem of audible click heard by end user.
DETAILED DESCRIPTION
Certain embodiments may avoid the latency issues found in conventional approaches by, for example, omitting the jitter buffering before the transcoding stage. Furthermore certain embodiments can avoid a peak processing load problem and packet clustering in the outgoing stream for received lost packets by freezing decoder and encoder running for the time period when packets are not received. As soon as a next valid packet is received and the receiver entity notices one or more lost packets, the amount of lost packets can be indicated to the encoder entity together with the valid decoded packet. Then the encoder can be run again using the valid decoded packet. For this encoded outgoing packet, the gap due to packet loss in the incoming stream can be indicated for the peer decoder by incrementing the RTP timestamp according to the gap. In this way, the bad frame handler of the peer decoder can interpolate the missing packets and reasonable voice quality can be maintained.
The above-mentioned approach can be enhanced when a lookahead is used in the encoder stage. The voice quality can be improved when decoded received speech signal is aligned according to the lookahead before the encoder stage.
Furthermore, voice quality can be enhanced by running decoder/bad frame handler and encoder once after the first lost packet. This enhancement may diminish quality effects due to encoder-decoder synch loss of the first codec pair. Moreover, the result may be close to a quality level of the transcoding with conventional bad frame handling. This will limit the peak processing load twice to the nominal load, which should be acceptable for the most of applications.
Finally certain embodiments provide a way to handle possible clicks with the discontinuous transmission (DTX) function of adaptive multi-rate (AMR) and adaptive multi-rate wideband (AMR-WB) codecs by substituting speech data including clicks with appropriate data.
As mentioned above, currently a jitter buffer is applied to an incoming speech stream before the transcoding stage. A result of the jitter buffer is that it increases the latency of the connection.
The jitter buffer may be mandatory for packet to circuit switched (CS) network interworking, as it removes jitter in packet stream before sending voice signal to the CS network. For IP-to-IP interworking, however, it may not be necessary to remove jitter, as the receiving CS gateway or IP terminal can equalize the packet stream before speech decoding and voice sample play-out.
Certain embodiments in the present disclosure avoid latency by omitting the jitter buffering before the transcoding stage. Certain embodiments also prevent peak processing load and packet clustering in the outgoing stream for received lost packets by freezing decoder and encoder running for the time period when packets are not received.
A first embodiment is shown in FIG. 3. In addition to the instant decoding without any jitter buffering, the bad frame handling stage within first decoder 130 can be omitted (contrary to the Related Art shown in FIGS. 1 and 2). Furthermore a packet loss indication block 125 informs the second encoder 140 about missing packets. In the case of one or more missing packets the first decoder 130 can be frozen such that no decoded voice packets are sent internally between first decoder 130 and second encoder 140. Also the running of second encoder 140 can be frozen during this period, such that no packets are sent towards the peer/second decoder 150.
As soon as the next valid packet has been received, it can be decoded, and along with this packet, the number of lost packets can be sent as a side-information to the second encoder 140. Now the second encoder 140 can be run again but the number of missing packets can be informed in this packet by incrementing the RTP time stamp by the amount of sampling clock ticks corresponding to the time of lost packets plus one packet. Additionally, the RTP sequence number may be incremented by the number of missing packets plus one. Alternatively, the RTP sequence number can be incremented by one. The jump in the timestamp can indicate to the peer jitter buffer/decoder/bad frame handler 150 that a certain amount of packets are missing and the decoder can be run with the bad frame handler multiple times in order to interpolate the missing voice packets. This can result in a constant decoded voice stream that could be played out to the CS network in case of the IP-to-CS gateway, or to D/A converter in the of case the IP terminal.
Benefits of the above-mentioned embodiment are that the latency due to conventional jitter buffering and a peak in the processing load can be avoided. Received packets are instantly forwarded to the loss indication and decoder blocks and then to the encoder block. Missing packets do not cause the generation of any packets between the first decoder 130 and second encoder 140 within the transcoder. This prevents a peak in the processing load once the next valid packet can be received after the lost packets, because the first decoder 130 and second encoder 140 are not run for the missing packets. Furthermore this prevents packet clustering of the conventional approach in which missing packets are interpolated within the transcoder before encoding, as illustrated in FIG. 2.
In a second embodiment, a loss of quality due to misalignment in the signal phase between received and sent packet streams can be compensated when the second encoder 140 uses a lookahead functionality. Lookahead can be used in the windowing of the linear prediction block of an encoder, and is used in many low-bitrate codecs, such AMR, AMR-WB and G.729.
A quality loss can arise when one or more successive packets are lost in the receiving packet stream. These lost packets can be reflected as missing packets in the outgoing packet stream with slightly different signal phase compared to incoming packets. Specifically, the signal phase can be delayed by the amount of lookahead, which is typically 5 ms. This phase difference can cause additional disturbances at the second decoder 150 output when a packet loss occurs. FIG. 5a illustrates this phenomenon for a triangle waveform of 50 Hz. In this example the most common packet size of 20 ms has been used which is also the typical codec frame size for low-bitrate speech codecs. At the initialization, the encoder adds 5 ms of zero signal before encoding the actual signal. This effectively increases delay by 5 ms at the second encoder 140 output.
Moreover a typical sub-frame size of 5 ms is utilized here for the internal decoded packet size/interval. This can be used by an MGW. Strictly speaking, G.711 encoded 5 ms packets can be used by the MGW, but these could be 5 ms packets having linear pulse code modulation (PCM) samples as well. When 5 ms internal packets are used, a 15 ms additional delay can be generated because the second encoder 140 cannot be run until the fourth sub-frame has been received. This can be due to timer based scheduling currently used by the MGW. This drawback could be enhanced, however, by sending internal packets as a cluster of four sub-frames.
A solution for the lookahead alignment problem is shown in FIGS. 4 and 5 b. If the second encoder 140 is utilizing a lookahead, the first sub-frame from the first decoder 130 can be dropped at the initialization phase of the transcoder. Thus, the first outgoing encoded packet will be generated from sub-frames 2 to 5. This will delay the sending of the first encoded and following packets by 5 ms compared to the non-alignment case. However as seen from FIG. 5b , the actual signal may not be delayed compared to the non-alignment case, and the delay may not be increased from the end user point of view. If the second encoder 140 is not using the lookahead functionality, the dropping of the first sub-frame may not be applied.
In the case of 20 ms packets used internally by the transcoder, the second packet must be awaited from the first decoder 130 until sub-frames 2 to 5 can be given to the second encoder 140. This can generate a delay of 20 ms, which is actually equal to the previous case in which 5 ms internal packets are used. Thus, certain embodiments can be used both for 5 ms and 20 ms internal packet sizes having the same delay. When the second encoder 140 is not using the lookahead, the 20 ms additional delay can be avoided.
A benefit of the lookahead alignment can be seen from FIG. 6. Simulations with ITU-T P.862.1 objective MOS tool and AMR12.2 kbps to AMR12.2 kbps transcoding scenario show an improvement of about 0.1 mean opinion score (MOS) compared to the scenario without lookahead alignment. However, it does not reach the quality level of the reference scenario, which is the conventional jitter buffering with bad frame handling. Accordingly, an enhancement to the first and second embodiments can be used, as described in a third embodiment.
In the third embodiment, voice quality of the reference scenario (jitter buffering and bad frame handling) is to be achieved. In the first and second embodiments disturbances appear after first decoder 130 if speech frames are dropped without any bad frame handling. This may be because synchronization is lost between the first encoder 105 and the first decoder 130.
There are some dependencies in successive encoded speech frames which may generate disturbances when encoded speech frames are just dropped. These disturbances can be diminished very effectively if the bad frame handler is run once for one lost packet. It can be enough to run the bad frame handler just once after the next valid packet, in practice, as can be seen from FIG. 7. This approach of running the bad frame handler just once per bad frame can limit the peak processing load to twice the nominal. As shown in FIG. 7, the performance of the third embodiment may be virtually the same as the reference scenario. Alternatively, the bad frame handler can be run once a certain time has passed since the last received packet. Thus, for packet losses of a longer period of time, the bad frame handler can also be run once after the last valid packet. For losses of more than one packet, the bad frame handler may be run twice or more times. This may further enhance voice quality.
Some limitation in processing load may also be necessary when, due to an excessive jitter, a cluster of packets is received from first encoder 105. Normally the jitter buffer can handle these kinds of occasions by equalizing reasonably clustered packets while excessively delayed packets are discarded. In a transcoder without jitter buffering, excessively clustered packets may induce a high peak load into the system. Here, a peak load limitation can solve this problem. In the peak load limitation, processing of clustered packets can be delayed or some packets can be dropped so that the desired load level is achieved.
Out-of-order packets can also be handled in a sensible manner by the present system. In order to minimize delay, a later packet, once received, can be decoded and sent towards the second encoder 140. This can lead to a situation in which a first packet must be discarded if it is received after a later packet. This can effectively increase the frame erasure ratio. For example, when two packets are received out of order and the latter sequence number packet is transcoded once received, the delay can be minimized because the first sequence number packet has not been awaited. The first sequence number packet is treated as an excessively delayed packet and is discarded. Out-of-order packets are very rare in real networks, so this kind of handling may have a small real impact.
A flow chart for the third embodiment is shown in FIG. 8. The sequence shown in the flow chart can be run at an appropriate polling rate, which can be less than a packet interval. The first decision block 810 detects whether a packet has been received. If it has not been received, first decoder 130 and second encoder 140 are kept in frozen state at 820.
Once a packet has been received, it is analyzed whether a threshold of the excessive delay has been exceeded at 830. If the delay has been exceeded, the packet can be discarded at 840. This threshold may be useful to avoid generating an excessive jitter to the peer decoder, and this threshold may be analogous to a buffering level of a conventional jitter buffer. Late packets exceeding the buffering level can be discarded by the jitter buffer.
If the excessive delay has not been exceeded at 830, a possible loss of one or more packets can be detected at 850 by, for example, analyzing RTP sequence number and timestamp of the received packet. If there is a gap in both sequence number and timestamp compared to the previously received packet, a packet loss can be determined to be present. The number of packets that are missing can also be determined, at 855. In the case of loss of one packet, the previously received valid packet can be decoded at 862 with the bad frame handler and the decoded frame can be sent to the second encoder 140 over the internal interface. Then the interpolated packet can be encoded, at 864, by the second encoder 140, and the timestamp (and optionally the sequence number) of the outgoing encoded packet can be incremented by one at 866.
Note that one timestamp unit here can refer to an increment of RTP sampling clock ticks of packet interval. For example, 160 ticks for an 8000 Hz sample rate and a 20 ms packet interval. After the sending of the interpolated encoded packet, the processing load level of the transcoder can be analyzed with respect to packet clustering 870 and buffering 875. If both packet clustering and buffering limit levels are not exceeded, the current packet can be decoded at 880 and forwarded to the second encoder 140. The buffering limit can be a maximum number of buffered packets. Then the decoded packet is encoded at 882 and timestamp and sequence number are incremented by one for the outgoing packet at 884.
If the clustering limit is exceeded but buffering limit is not exceeded, the running of the first decoder 130 and second encoder 140 can be delayed at 877, such that the processing peak load is kept within an allowed limit. If the buffering limit is exceeded, the current packet is dropped. Alternatively one of the previously received packets could be dropped from the buffer.
If a loss of more than one packet has been detected, the current valid packet can be decoded at 861 and sent to the second encoder 140 together with an indication of the number of lost packets. The outgoing packet can be encoded at 863 and the timestamp can be incremented by the number of lost packets plus one at 865. This case may be equivalent to the first embodiment, discussed above.
In the case of a received packet but no detected packet loss, the correctness of the sequence number can be studied by verifying that the current sequence number is higher by one than the previously received, at 890. If this is true, the clustering and buffering limits can be analyzed as discussed above, via 870 and 875, if appropriate. If the received packet has been received out of order, it can be dropped at 840.
The third embodiment can also include the quality enhancement provided by the second embodiment.
The architecture of the first, second, and third embodiments may cause an audible click to transfer into the encoded audio after second encoder 140 when Discontinuous Transmission (DTX) is used and DTX hangover period frames are lost before first decoder 130. Specifically the audible click may be caused by the following.
In first decoder 130, the decoder can generate comfort noise when SID frames are received. For the first SID_FIRST/SID_UPDATE after speech frames, comfort noise can be generated based on hangover period speech frames, which can be, for example, 7 previous speech frames. If some of the hangover period frames are lost, then the comfort noise parameters can be calculated from frames that can contain high energy speech as illustrated in FIG. 9. The result of such calculation can be audible clicks in the decoded signal, as illustrated in FIG. 10.
The audible click from first decoder 130 can be encoded by the second encoder 140. The jump in the timestamp can be indicated to the peer jitter buffer/decoder/bad frame handler 150 of the missing packets. Then, the audible click can be heard after second decoder 150 by the end user as illustrated in FIG. 11.
There are multiple ways to avoid this audible click, when missing frames from handover period have been detected.
According to a first example implementation, after first decoder 130, the system can set PCM samples to zero for the duration of SID_FIRST+first SID_UPDATE comfort noise period, after second SID_UPDATE has been received resume normal operation
According to a second example implementation, first SID_FIRST & SID_UPDATE can be replaced with homing frames. After second SID_UPDATE has been received normal operation can be resumed.
According to a third example implementation, the “frames elapsed since the last SID frame” counter can be set to zero in first decoder 130, resulting in the previous SID update being used for comfort noise calculation. 3GPP technical specification (TS) 26.092, which is hereby incorporated herein by reference in its entirety, explains: “The decoder counts the number of frames elapsed since the last SID frame was updated and passed to the RSS by the encoder. Based on this count, the decoder determines whether or not there is a hangover period at the end of the speech burst. The interpolation factor is also adapted to the SID update rate. As soon as a SID frame is received comfort noise is generated at the decoder end. The first SID frame parameters are not received but computed from the parameters stored during the hangover period. If no hangover period is detected, the parameters from the previous SID update are used.”
According to a fourth example implementation, frames can be buffered from previous hangover periods and the system can use those frames to generate comfort noise, thus avoiding the use of non-hangover period speech frames.
According to a fifth example implementation, the system can substitute the frames containing the audible click after first decoder 130 on second encoder 140 side with no_data frames.
According to a sixth example implementation, the system can model background noise level and spectrum from previous speech pauses and replace the frames containing the audible click after first decoder 130 with synthesized comfort noise.
Certain embodiments may have various advantages. For example, certain embodiments may avoid the latency issues found in conventional jitter buffering, as received packets may be instantly forwarded to the loss indication and decoder blocks and then to the encoder block. These embodiments may also prevent packet clustering, because the missing packets are not interpolated within the transcoder before encoding.
Embodiments of an apparatus can take various forms. For example, an apparatus, which may be a transcoder, can include a packet loss detector configured to receive a packet at the packet loss detector. The apparatus can be configured to omit jitter buffering before decoding and to omit bad frame handling in a decoding stage. The packet loss detector can be any suitable device, and may include one or more controller, processor, memory, or combination thereof. The packet loss detector can, for example, correspond to packet loss indication block 125.
The apparatus can also include a decoder configured to decode the packet into a decoded packet. The decoder can correspond to first decoder 130.
Further, the apparatus can include an encoder configured to encode the decoded packet into a re-encoded packet. The encoder can correspond to second encoder 140.
The apparatus can additionally include a transmitter configured to transmit the re-encoded packet from the transcoder. The transmitter can be variously embodied. For example, the transmitter can be a network interface card, a port, a wireless modem, or any other suitable communication hardware.
The packet loss detector can further be configured to monitor for a received packet and freeze the decoder and the encoder when a packet is not received, as illustrated at 810 and 820 in FIG. 8.
Further, the packet loss detector can be configured to determine whether a received packet is excessively delayed and drop the received packet when the received packet is excessively delayed, as illustrated at 830 and 840 in FIG. 8.
The packet loss detector can further be configured to determine whether a received packet is out of order and drop the received packet when the received packet is out of order, as illustrated at 890 and 840 in FIG. 8.
In addition, the packet loss detector can be configured to determine whether a clustering limit is exceeded, determine whether a buffering limit is exceeded when the clustering limit is exceeded, and delay processing a received packet when the buffering limit is not exceeded, as illustrated at 870, 875, and 877 in FIG. 8.
The packet loss detector can further be configured to determine whether a buffering limit is exceeded and drop a received packet when the buffering limit is exceeded, as illustrated at 875 and 840 in FIG. 8.
Further, the packet loss detector can be configured to determine whether only a single packet has been lost and interpolate an interpolated packet for the single packet when only the single packet has been lost, as illustrated at 855 and 864 in FIG. 8.
The packet loss detector can further be configured to signal an amount of lost packets to the encoder when a valid packet is received after at least one packet is lost or dropped, as illustrated in FIGS. 3 and 4.
Also, the packet loss detector can be configured to provide a lookahead at the encoder and align a received speech signal according to the lookahead before the received speech signal is provided to the encoder.
The packet loss detector can further be configured to determine that comfort noise is to be applied during a period of a lost packet and control the comfort noise to avoid an audible click. This can be accomplished using any of the six implementation examples described above, or by any other way.
One having ordinary skill in the art will readily understand that the invention as discussed above may be practiced with steps in a different order, and/or with hardware elements in configurations which are different than those which are disclosed. Therefore, although the invention has been described based upon these preferred embodiments, it would be apparent to those of skill in the art that certain modifications, variations, and alternative constructions would be apparent, while remaining within the spirit and scope of the invention. In order to determine the metes and bounds of the invention, therefore, reference should be made to the appended claims.
GLOSSARY
  • 3G Third Generation
  • AMR Adaptive Multi-rate
  • AMR-WB Adaptive Multi-rate Wideband
  • AoIP A-interface over IP
  • CS Circuit Switched
  • D/A Digital to Analog
  • DTX Discontinuous Transmission
  • GSM Global system for mobile communication
  • IP Internet Protocol
  • LTE Long Term Evolution
  • MGW Media Gateway
  • MOS Mean Opinion Score
  • MRF Multimedia Resource Function
  • PCM Pulse Code Modulation
  • RTP Real Time Protocol, Real-time Transport Protocol
  • RSS Radio Sub-System
  • SID Silence Descriptor
  • VoIP Voice over Internet Protocol

Claims (20)

We claim:
1. A method, comprising:
receiving a packet at a packet loss detector of a transcoder;
omitting jitter buffering before decoding in the transcoder and omitting bad frame handling in a decoding stage of the transcoder;
decoding the packet into a decoded packet;
encoding the decoded packet into a re-encoded packet;
transmitting the re-encoded packet from the transcoder;
monitoring for a received packet;
freezing a decoder and an encoder of the transcoder when a packet is not received;
sending packet loss information from the decoder to the encoder when a packet loss is detected; and
setting an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
2. An apparatus, comprising:
a packet loss detector configured to receive a packet at the packet loss detector, wherein the apparatus is configured to omit jitter buffering before decoding and to omit bad frame handling in a decoding stage;
a decoder configured to decode the packet into a decoded packet;
an encoder configured to encode the decoded packet into a re-encoded packet; and
a transmitter configured to transmit the re-encoded packet from the transcoder,
wherein the packet loss detector is further configured to
monitor for a received packet; and
freeze the decoder and the encoder when a packet is not received,
wherein the decoder is configured to send packet loss information to the encoder when a packet loss is detected, and
wherein the encoder is configured to set an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
3. The apparatus of claim 2, wherein the packet loss detector is further configured to signal an amount of lost packets to the encoder when a valid packet is received after at least one packet is lost or dropped, and
wherein the encoder is configured to set the outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after packets are not received.
4. The apparatus of claim 2, wherein the decoder is configured to
set a timer when the packet is not received; and
perform bad frame handling on at least one packet when the timer expires.
5. The apparatus of claim 2, wherein the packet loss detector is further configured to perform bad frame handling on a plurality of packets up to a predetermined maximum number of packets.
6. The apparatus of claim 2, wherein the packet loss detector is further configured to
determine whether a received packet is out of order; and
drop the received packet when the received packet is out of order.
7. The apparatus of claim 2, wherein the packet loss detector is further configured to
determine whether a clustering limit is exceeded;
determine whether a buffering limit is exceeded when the clustering limit is exceeded; and
delay processing a received packet when the buffering limit is not exceeded.
8. The apparatus of claim 2, wherein the packet loss detector is further configured to
determine whether a buffering limit is exceeded; and
drop a received packet when the buffering limit is exceeded.
9. The apparatus of claim 2, wherein the packet loss detector is further configured to
provide a lookahead at the encoder; and
align a received speech signal according to the lookahead before the received speech signal is provided to the encoder.
10. The apparatus of claim 2, wherein the packet loss detector is further configured to
determine that comfort noise is to be applied with comfort noise being generated from other frames than hangover period frames; and
control the comfort noise to avoid an audible click.
11. The apparatus of claim 10, wherein the packet loss detector is configured to control the comfort noise by at least one of:
setting PCM samples to zero for the duration of SID_FIRST+first SID_UPDATE comfort noise period, after second SID_UPDATE has been received resume normal operation;
replacing first SID_FIRST & SID_UPDATE with homing frames, after second SID_UPDATE has been received resume normal operation;
setting a frames elapsed since the last SID frame counter to zero in the decoder;
buffering frames from previous hangover periods and using the buffered frames to generate comfort noise;
substituting frames containing an audible click after the decoder with no_data frames at the encoder; or
modeling background noise level and spectrum from previous speech pauses and replacing frames containing the audible click after the decoder with synthesized comfort noise.
12. An apparatus, comprising:
receiving means for receiving a packet at a packet loss detector of a transcoder, wherein the apparatus is configured to omit jitter buffering before decoding and to omit bad frame handling in a decoding stage;
decoding means for decoding the packet into a decoded packet;
encoding means for encoding the decoded packet into a re-encoded packet; and
transmitting means for transmitting the re-encoded packet from the transcoder;
monitoring means for monitoring for a received packet;
freezing means for freezing a decoder and an encoder of the transcoder when a packet is not received;
sending means for sending packet loss information from the decoder to the encoder when a packet loss is detected; and
setting means for setting an outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after the packet is not received.
13. The apparatus of claim 12, further comprising:
signaling means for signaling an amount of lost packets to the encoder when a valid packet is received after at least one packet is lost or dropped; and
setting means for setting the outgoing packet stream to permit detection of missing packets by a downstream decoder upon receiving a valid packet after packets are not received.
14. The apparatus of claim 12, further comprising:
setting means for setting a timer when the packet is not received; and
bad frame handling means for performing bad frame handling on at least one packet when the timer expires.
15. The apparatus of claim 14, wherein the bad frame handling is performed on a plurality of packets up to a predetermined maximum number of packets.
16. The apparatus of claim 12, further comprising:
determining means for determining whether a received packet is out of order; and
dropping means for dropping the received packet when the received packet is out of order.
17. The apparatus of claim 12, further comprising:
determining means for determining whether a clustering limit is exceeded and for determining whether a buffering limit is exceeded when the clustering limit is exceeded; and
delaying means for delaying processing a received packet when the buffering limit is not exceeded.
18. The apparatus of claim 12, further comprising:
determining means for determining whether a buffering limit is exceeded; and
dropping means for dropping a received packet when the buffering limit is exceeded.
19. The apparatus of claim 12, further comprising:
providing means for providing a lookahead at the encoder; and
aligning means for aligning a received speech signal according to the lookahead before the received speech signal is provided to the encoder.
20. The apparatus of claim 12, further comprising:
determining means for determining that comfort noise is to be applied with comfort noise being generated from other frames than hangover period frames; and
controlling means for controlling the comfort noise to avoid an audible click.
US14/786,779 2013-04-25 2013-04-25 Speech transcoding in packet networks Expired - Fee Related US9812144B2 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2013/058573 WO2014173446A1 (en) 2013-04-25 2013-04-25 Speech transcoding in packet networks

Publications (2)

Publication Number Publication Date
US20160078876A1 US20160078876A1 (en) 2016-03-17
US9812144B2 true US9812144B2 (en) 2017-11-07

Family

ID=48577684

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/786,779 Expired - Fee Related US9812144B2 (en) 2013-04-25 2013-04-25 Speech transcoding in packet networks

Country Status (4)

Country Link
US (1) US9812144B2 (en)
EP (1) EP2989632A1 (en)
CN (1) CN105324813A (en)
WO (1) WO2014173446A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2521883B (en) * 2014-05-02 2016-03-30 Imagination Tech Ltd Media controller

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2306861A (en) 1995-11-03 1997-05-07 Motorola Ltd Handling erroneous data frames on a multi-hop communication link
US6324503B1 (en) * 1999-07-19 2001-11-27 Qualcomm Incorporated Method and apparatus for providing feedback from decoder to encoder to improve performance in a predictive speech coder under frame erasure conditions
US20020013696A1 (en) * 2000-05-23 2002-01-31 Toyokazu Hama Voice processing method and voice processing device
US20030065508A1 (en) * 2001-08-31 2003-04-03 Yoshiteru Tsuchinaga Speech transcoding method and apparatus
US6615169B1 (en) * 2000-10-18 2003-09-02 Nokia Corporation High frequency enhancement layer coding in wideband speech codec
US6772112B1 (en) * 1999-12-10 2004-08-03 Lucent Technologies Inc. System and method to reduce speech delay and improve voice quality using half speech blocks
US6782361B1 (en) * 1999-06-18 2004-08-24 Mcgill University Method and apparatus for providing background acoustic noise during a discontinued/reduced rate transmission mode of a voice transmission system
US20050177364A1 (en) * 2002-10-11 2005-08-11 Nokia Corporation Methods and devices for source controlled variable bit-rate wideband speech coding
US20060178872A1 (en) * 2005-02-05 2006-08-10 Samsung Electronics Co., Ltd. Method and apparatus for recovering line spectrum pair parameter and speech decoding apparatus using same
US7434117B1 (en) * 2005-10-28 2008-10-07 Mediatek Inc. Method and apparatus of determining bad frame indication for speech service in a wireless communication system
US20090180531A1 (en) * 2008-01-07 2009-07-16 Radlive Ltd. codec with plc capabilities
US20090268755A1 (en) * 2008-04-23 2009-10-29 Oki Electric Industry Co., Ltd. Codec converter, gateway device, and codec converting method
US20100284281A1 (en) 2007-03-20 2010-11-11 Ralph Sperschneider Apparatus and Method for Transmitting a Sequence of Data Packets and Decoder and Apparatus for Decoding a Sequence of Data Packets
US20130077632A1 (en) 2011-09-27 2013-03-28 Oki Electric Industry Co., Ltd. Buffer controller correcting packet order for codec conversion

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2306861A (en) 1995-11-03 1997-05-07 Motorola Ltd Handling erroneous data frames on a multi-hop communication link
US6782361B1 (en) * 1999-06-18 2004-08-24 Mcgill University Method and apparatus for providing background acoustic noise during a discontinued/reduced rate transmission mode of a voice transmission system
US6324503B1 (en) * 1999-07-19 2001-11-27 Qualcomm Incorporated Method and apparatus for providing feedback from decoder to encoder to improve performance in a predictive speech coder under frame erasure conditions
US6772112B1 (en) * 1999-12-10 2004-08-03 Lucent Technologies Inc. System and method to reduce speech delay and improve voice quality using half speech blocks
US20020013696A1 (en) * 2000-05-23 2002-01-31 Toyokazu Hama Voice processing method and voice processing device
US6615169B1 (en) * 2000-10-18 2003-09-02 Nokia Corporation High frequency enhancement layer coding in wideband speech codec
US20030065508A1 (en) * 2001-08-31 2003-04-03 Yoshiteru Tsuchinaga Speech transcoding method and apparatus
US20050177364A1 (en) * 2002-10-11 2005-08-11 Nokia Corporation Methods and devices for source controlled variable bit-rate wideband speech coding
US20060178872A1 (en) * 2005-02-05 2006-08-10 Samsung Electronics Co., Ltd. Method and apparatus for recovering line spectrum pair parameter and speech decoding apparatus using same
US7434117B1 (en) * 2005-10-28 2008-10-07 Mediatek Inc. Method and apparatus of determining bad frame indication for speech service in a wireless communication system
US20100284281A1 (en) 2007-03-20 2010-11-11 Ralph Sperschneider Apparatus and Method for Transmitting a Sequence of Data Packets and Decoder and Apparatus for Decoding a Sequence of Data Packets
US20090180531A1 (en) * 2008-01-07 2009-07-16 Radlive Ltd. codec with plc capabilities
US20090268755A1 (en) * 2008-04-23 2009-10-29 Oki Electric Industry Co., Ltd. Codec converter, gateway device, and codec converting method
US20130077632A1 (en) 2011-09-27 2013-03-28 Oki Electric Industry Co., Ltd. Buffer controller correcting packet order for codec conversion

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
"3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Mandatory speech codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Comfort noise aspects (Release 11)", 3GPP DRAFT; 26092-B00, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, 26092-b00, 21 September 2012 (2012-09-21), Mobile Competence Centre ; 650, route des Lucioles ; F-06921 Sophia-Antipolis Cedex ; France, XP050686378
"Universal Mobile Telecommunications System (UMTS); LTE; IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction (3GPP TS 26.114 version 10.6.0 Release 10)", TECHNICAL SPECIFICATION, EUROPEAN TELECOMMUNICATIONS STANDARDS INSTITUTE (ETSI), 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS ; FRANCE, vol. 3GPP SA 4, no. V10.6.0, 126 114, 1 April 2013 (2013-04-01), 650, route des Lucioles ; F-06921 Sophia-Antipolis ; France, XP014156501
3GPP TS 26.092 V11.0.0 (Sep. 2012), Technical Specification, 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Mandatory speech codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Comfort noise aspects (Release 11), Sep. 21, 2012, XP050686378, 12 pages.
Bo Wei et al., "Voice Transmission Over All-IP Tandem Links," Asilomar Conference on Signals, Systems and Computers; IEEE 2003, pp. 275-279; XP010701874.
BO WEI, GIBSON J.D.: "Voice transmission over all-IP tandem links", CONFERENCE RECORD OF THE 37TH. ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, & COMPUTERS. PACIFIC GROOVE, CA, NOV. 9 - 12, 2003., NEW YORK, NY : IEEE., US, vol. 1, 9 November 2003 (2003-11-09) - 12 November 2003 (2003-11-12), US, pages 275 - 279, XP010701874, ISBN: 978-0-7803-8104-9, DOI: 10.1109/ACSSC.2003.1291914
ETSI TS 126 114 V10.6.0 (Apr. 2013), Technical Specification, "Universal Mobile Telecommunications System (UMTS); LTE; IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction (3GPP TS 26.114 version 10.6.0 Release 10)", Apr. 1, 2013, XP014156501, 272 pages.
International Search Report and Written Opinion dated Jan. 14, 2014 corresponding to International Patent Application No. PCT/EP2013/058573.

Also Published As

Publication number Publication date
CN105324813A (en) 2016-02-10
WO2014173446A1 (en) 2014-10-30
US20160078876A1 (en) 2016-03-17
EP2989632A1 (en) 2016-03-02

Similar Documents

Publication Publication Date Title
KR100902456B1 (en) Method and apparatus for managing end-to-end voice over internet protocol media latency
US7453897B2 (en) Network media playout
US10651976B2 (en) Method and apparatus for removing jitter in audio data transmission
EP2055055B1 (en) Adjustment of a jitter memory
US8831001B2 (en) Device, system, and method of voice-over-IP communication
EP2122999B1 (en) Dividing rtcp bandwidth between compound and non- compound rtcp packets
US7573907B2 (en) Discontinuous transmission of speech signals
US10212552B2 (en) Methods and devices for controlling speech quality
US8438018B2 (en) Method and arrangement for speech coding in wireless communication systems
US8270391B2 (en) Method and receiver for reliable detection of the status of an RTP packet stream
US9025504B2 (en) Bandwidth efficiency in a wireless communications network
US9812144B2 (en) Speech transcoding in packet networks
US7796626B2 (en) Supporting a decoding of frames
US7983309B2 (en) Buffering time determination
Kang et al. A speech packet loss concealment algorithm using real-time speech quality measurement and redundancy coding
Kang et al. A Smart Error Protection Scheme Based on Estimation of Perceived Speech Quality for Portable Digital Speech Streaming Systems
Wah et al. New Piggybacking Algorithm on G. 722.2 VoIP Codec with Multiple Frame Sizes

Legal Events

Date Code Title Description
AS Assignment

Owner name: NOKIA SOLUTIONS AND NETWORKS OY, FINLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIRLA, OLLI SAKARI;KURITTU, ANTTI PEKKA EINARI;REEL/FRAME:036871/0368

Effective date: 20151014

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20211107