Home > Learning Center > Echo Basics Tutorial

Echo Basics Tutorial

In a phone conversation, echo is the sound of your own voice being played back to you after a delay. Sometimes it is overlaid with the other party’s voice. Strong and delayed echo signals can be very annoying, and in some extreme cases, make conversation impossible. Public Switched Telephone Network (PSTN), Mobile, and Voice over IP (VoIP) communications systems can get echo from a number of sources, so network-based echo cancellers are critical for good quality of service. In today’s competitive market, the absence of an efficient echo cancellation method can prove detrimental on the carrier’s ability to retain subscribers.

This tutorial explores the sources of echo in telecommunications networks, the impact of echo on service quality, and the methods used to keep echo under control. Standards used to benchmark echo cancellers are introduced, as are performance requirements for echo cancellers in today’s PSTN/Mobile and VoIP-converged networks.

Types of Echo

Although it is difficult for a listener to differentiate them, there are two types of echo that occur in the typical communications network: hybrid echo and acoustic echo. Hybrid echo (also known as “electrical echo”) is caused by an impedance mismatch on the 4-wire to 2-wire conversion in wireline networks. It is the primary network-induced echo in today’s networks. Acoustic echo is created as a result of insufficient acoustic isolation between the earpiece and the microphone in small handsets, or when acoustic waves are reflected against a wall or enclosure, typically when using a hands-free unit.

Hybrid Echo

In a wireline PSTN network, the subscriber is linked to the local exchange (central office) by a 2-wire analog connection know as the “local loop.” From the local exchange, a 4-wire digital link is used to carry the signal longer distances. For this link, the send and receive paths use separate wire pairs. Between the two link methods is the hybrid, which converts the 4-wire interface to the 2-wire interface. The hybrid is a 4-port device where the fourth port is terminated with a balancing impedance.

Figure 1 – Hybrids in a PSTN

To avoid signal reflections in the hybrid, the balancing impedance of the hybrid must match the impedance of the 2-wire line terminated by the telephone. The impedance of the 2-wire line depends on many parameters, such as the length and type of cable, as well as the impedance of the telephone sets at the customer premises. In practice, the balance of the hybrid is only nominally achieved because the 2-wire loop’s impedance cannot be determined in advance. Therefore, a fraction of the signal is reflected back to the sender, which is heard as echo.

Figure 2 – Echo from the Hybrid

The degree of imbalance of the hybrid determines the strength of the echo reflection. This strength of the echo reflection is expressed in terms of Echo Return Loss (ERL). Echo Return Loss and additional echo metrics are explained in the section titled “Measuring Echo’s Effect on Quality of Service.”

Acoustic Echo

Acoustic echo occurs when some of the sound from the speaker part of the telephone gets picked up and transmitted back by the microphone.

There are two typical sources of acoustic echo. Acoustic isolation echo (or “acoustic coupling”) is generated when the earpiece and microphone are poorly isolated from one another. In today’s wireless networks, acoustic echo is common due to a proliferation of poorly designed handsets, headsets, and Bluetooth headsets. Acoustic echo isolation becomes especially problematic when, for example, a wireless user having trouble hearing in a noisy environment increases the earpiece volume to the maximum, and then holds the phone in such a way that there is poor isolation between the earpiece and microphone.

The second form of acoustic echo is called ambient acoustic echo. This type of acoustic echo is generated when a telephone conversation is held in an acoustically reflective environment. In this situation, the handset microphone first picks up the original audio stream, followed by the speech that is reflected from the walls. Ambient acoustic echo is most likely to occur with “hands free” kits and speakerphones.

Figure 3 – Acoustic Echo Sources

About 10% of calls in today’s wireless networks have acoustic echo (5% each for off-net and on-net callers).

Hybrid and Acoustic Echo Differences

Stationarity

The hybrid echo path is stationary, which means that it is invariant over time. Once the call path is established, the echo delay does not change during the course of the call. Acoustic echo, on the other hand, varies based on a multitude of external factors like the position of the talker in the room, or even head movements relative to the handset, which makes the acoustic echo a highly nonstationary signal.

Linearity

Linearity is how well the waveform of echo signal matches the original signal. Hybrid echo is a linear signal, which means that a linear mathematical model constructed inside the echo canceller can accurately predict the hybrid echo signal. Acoustic echo is not a linear signal. First, nonlinearities might be created by the analog circuitry. In the case of the handset/headset it includes the microphone, the microphone amplifier, the loudspeaker amplifier, and the loudspeaker. More significantly, for wireless and many VoIP calls, the voice codec processing introduces additional nonlinearities.

Dispersion

An echo signal is not a single reflection of the original signal, but is a consecutive reflection over a period of time. Echoes have a certain duration, or dispersion time, which is the period of time during which the echo reflection occurs. A hybrid echo has a typical dispersion of less than 10 ms. However, since acoustic echo can be generated by reflections from the environment, acoustic echo is more dispersive, with dispersion times of up to 100 ms.


Figure 4 – Hybrid Echo Waveform


Figure 5 – Acoustic Echo Waveform

Measuring Echo’s Effect on Quality of Service

The degree to which echo becomes objectionable to the listener depends on the echo loudness and the delay between the original voice and the echo.

Loudness

The metric used to characterize echo loudness in the network is Echo Return Loss (ERL), or in some instances “Echo Path Loss.” ERL is expressed in decibels (dB) in terms of signal loss relative to the original signal’s strength.

ERL (dB) = Original Signal Level (dBm) – Echo Signal Level (dBm)

Note that a smaller ERL value equates to a louder echo, and a larger ERL value equates to a softer echo. For example, an ERL of 0 dB means that the echo signal is as strong as the original signal (not a practical case), whereas an ERL of 55 dB is typically a low-level echo.

As shown earlier, it is safe to assume that a hybrid’s impedance does not perfectly match the circuit’s impedance. An ERL as low as 6 dB can be expected, and in extreme cases, can be as low as 3 dB. Acoustic echo in the network is typically weaker than hybrid echo.

Weighted Acoustic Echo Path Loss (WAEPL) is sometimes used to define acoustic echo loudness. It is similar to ERL, but because of the nonlinearity of acoustic echo, it weighs certain frequencies more than others. For practical understanding, WAEPL is the same as ERL.

Delay

In addition to signal strength, the perception of echo by the talker also depends on the delay between the voice and the echo. An echo reflection, even with low ERL (strong echo), can be negligible if the round trip delay is short (less than 20 ms), because it is masked by the sidetone of the talker’s own voice. Sidetone is the phone feature where some of the talker’s voice is played back to the earpiece so the talker easily knows if the line is on or off hook.

At about 30 ms, the delayed copy of the voice is perceived as a “hollow” or “tunnel-like” sound to their sidetone. With increasing delay, the echo becomes more noticeable as echo and increasingly bothersome to the talker.

There are many sources of delay in today’s communications networks. In PSTN, delay is caused primarily by propagation, and echo generally becomes a concern only on long distance calls. In wireless networks, however, codec processing and air interface delays add roughly 160 ms of round trip delay, so any hybrid or acoustic echo in the network becomes noticeable. In VoIP networks, codec processing, packetization, routing, and buffering, add an additional level of delay that is comparable or greater than wireless delay. Generally, the more complex the network, the greater the delay.

Hybrid echo is heard at twice the one-way transmission delay between the talker’s handset and the hybrid at the local exchange. For acoustic echo, the echo delay is twice the end-to-end delay between handsets. Because the PSTN’s local loop delay is negligible, both forms of echo are heard at approximately twice the end-to-end delay.

Because echo cancellers sit within the network, the delays they operate on are shorter than the delay that would be heard by the user. This delay is called the Echo Path Delay, or Tail Length. These terms are further explained in the section titled “What to Look for in a Hybrid Echo Canceller.”

Echo Objection Rate

Recommendation G.131 (Talker Echo and its Control) from the Telecommunication Standardization Sector of the International Telecommunications Union (ITU-T) presents results on the degree of annoyance of echo as a function of the amount of delay and Talker Echo Loudness Rating (TELR). TELR the echo loss as perceived by the listener, which is the loss between the talker’s mouth and ear via the phone and echo path. If we account for 10 dB of loss introduced by the typical phone (per ITU-T G.121), the echo tolerance curve from G.131 can be shown as a function of ERL (Figure 6).


Figure 6 – Echo Objection Rate as a Function of ERL and Delay

The areas “Acceptable,” “Limiting Case,” and “Unacceptable” shown in Figure 6 correspond to the probability of encountering objectionable echo as perceived by listeners.

Canceling Echo

Since the invention of the telephone, various techniques and technologies have been employed to cancel echo. Today’s echo cancellation technology uses digital signal processing (DSP) and echo cancellation algorithms. Solving the echo problem in the “Unacceptable” and “Limiting Case” areas of the Echo Objection Rate graph (Figure 6 above) is the key to providing service quality.

Hybrid Echo Cancellation

Hybrid echo cancellers remove echo by creating a replica of the linear echo signal and subtracting it from the far-end speech signal. This method makes use of the fact that the hybrid echo path is stationary. That is, once the call path is established, the round trip delay of the echo signal (tail length) does not vary. This method also considers the fact that the hybrid echo leakage is linear – a given hybrid with a given 2-wire loop produces a predictable echo signal. Therefore, a mathematical model of the echo signal based on the near-end speaker’s voice can be accurately built inside the echo canceller.

Features of a Hybrid Echo Canceller

Figure 7 provides a high-level view of the building blocks for the hybrid echo canceller.


Figure 7 – Hybrid Echo Canceller Functional Blocks

Adaptive Linear Filter
The far-end signal (Receive In, or Rin) is transmitted through the Adaptive Linear Filter, which produces an echo estimate. This echo estimate is then subtracted from the echo signal (Send In, or Sin). The difference between the two signals (error signal) is used to adapt the filter’s coefficients. This process is repeated until the error signal is minimized, and the echo estimate matches the echo as close as possible.

Double Talk Detector
The Double Talk Detector controls the Adaptive Linear Filter behavior during periods when both near-end and far-end signals are at significant levels (when both callers are talking at the same time). The objective is to freeze the filter adaptation during double talk periods because double talk may “confuse” the echo canceller, leading to divergence in the echo canceller algorithm.

Nonlinear Processor
The Nonlinear Processor (NLP) is used to remove the residual echo signal, that is, the components that could not be removed by the linear filter alone. The NLP is not activated during periods of double talk.

Comfort Noise Generator
Optionally, the echo canceller can use the Comfort Noise Generator (CNG) to inject spectrally matched comfort noise during nonlinear processing to avoid a “dead air” effect.

Hybrid Echo Canceller Standards

ITU-T G.168, Digital Network Echo Cancellers: This recommendation describes the characteristics of the echo canceller, as well as the laboratory tests that should be performed on an echo canceller to assess its performance under conditions likely to be experienced in the network.

What to Look for in a Hybrid Echo Canceller

A few key terms defined in the ITU-T G.168 standard are helpful in comparing echo canceller performance.

Echo Return Loss Enhancement
Echo Return Loss Enhancement (ERLE) is defined by G.168 as “The attenuation of the echo signal as it passes through the send path of an echo canceller. This definition specifically excludes any nonlinear processing on the output of the canceller to provide for further attenuation.”

For example, if the echo on a line has an ERL of 15 dB and the echo canceller is capable of an ERLE of 30 dB, then the residual echo is (15 + 30) 45 dB less than the original signal, before non-linear processing is applied.

When comparing echo cancellers, look for an ERLE of greater than 35 dB to be sure that the echo can be reduced to within acceptable levels when the ERL is low. The NLP should be able to provide an additional 30 dB of echo reduction.

Convergence Time
Convergence Time is defined by G.168 as “For a defined echo path, the interval between the instant a defined test signal is applied to the receive-in (Rin) port of an echo canceller with the estimated echo path impulse response initially set to zero, and the instant the returned echo level at the send-out (Sout) port reaches a defined level.”

Once converged, the linear filter produces an optimal model of the echo path. If the echo path changes during the call (for example, because of a call transfer or conference call), the Adaptive Linear Filter re-converges to produce a new model of the echo.

The faster an echo canceller converges, the less time talkers hear their own echo at the start of a conversation. A good echo canceller can converge to 30 dB or better ERL+ERLE within 50 ms.

Echo Path Delay (Tail Length)
Echo Path Delay is often referred to as “Tail Length,” and sometimes “Tail Delay.” It is defined by G.168 as “The maximum echo path delay for which an echo canceller is designed to operate.” The echo path is “The transmission path between Rout and Sin of an echo canceller. This term is intended to describe the signal path of the echo.”

A key differentiator between today’s hybrid echo cancellers is the ability to handle long tail lengths. Longer delays make echo more annoying, and as network transport technologies evolve, network delay is increasing. Legacy echo cancellers capable of only 64-128 ms of tail length will not generally handle the echo in next-generation VoIP-converged networks.

Acoustic Echo Control

The method used for acoustic echo control (AEC) is different than that for hybrid echo cancellation. The hybrid echo signal is linear and stationary, meaning that once the echo model is established, it does not typically change in characteristics or delay, so a non-changing model of the echo can be constructed in the echo canceller.

The characteristics of acoustic echo, however, vary depending on a multitude of external factors. These changes make the acoustic echo nonstationary and nonlinear. For this reason, if the traditional hybrid echo cancellation method was used on acoustic echo, the echo canceller would be continuously diverging and re-converging, producing a very poor result. Also, unlike hybrid echo, acoustic echo is not a linear signal because the echo fed back to the microphone is altered by analog circuitry and may also get compressed by the wireless/VoIP codec, so an acoustic echo control method based on linear filtering would not be able to remove the echo.

What to Look for in an Acoustic Echo Canceller

Unlike hybrid echo cancellers, there is no standard design for the acoustic echo control block. Acoustic echo control methods vary between vendors, with varied success.

The following describes the important features that an acoustic echo control design must include in order to be efficient in today’s wireless and VoIP-converged networks:

Acoustic Echo Canceller Standards

There are no standards dedicated to acoustic echo control in the network. A related recommendation is ITU-T G.160, Voice Enhancement Devices (in standardization process), which applies to the characteristics and testing of Voice Enhancement Devices (VEDs) intended for use in digital-network-based equipment for Mobile networks. Voice enhancement functions include acoustic echo control and background noise reduction.

Echo in Today’s Networks

New demands are being placed on today’s echo cancellers. Hybrid echo tail lengths are increasing due to increasingly complex transport systems, and acoustic echo, which on PSTN networks only appeared from speakerphones, is now prevalent in wireless calls, which also have increased delay, making the echo more annoying.

Acoustic echo control is challenging due to the complex nature of the echo signal. The growing problem of acoustic echo from wireless calls is acknowledged, but many solution vendors have attempted to adapt hybrid echo cancellation methods to acoustic echo control, with poor results.

Echo control is essential to good voice quality in a network, and voice quality is becoming increasingly important as wireless competition increases and voice impairment issues remain a barrier to VoIP migration.