A variety of common audio effects are implemented using variable-length delay lines. In this section, we analyze several of these techniques. Note that a discussion of artificial reverberation will be ``delayed'' to a later section. We also briefly cover here some issues of dynamic range compressions/expansion.
Flanging involves the summing together of a signal and a time-varying delayed version of itself.
The input-output relationship for a flanger is given by:
y[n] = x[n] + g x[n - M[n]],
where M[n] is the time-varying length of a delay line and g is the ``depth'' of the flanging effect. A flanger block diagram is shown in Fig. 1.
A digital flanger block diagram.
Because the delay-line length, M[n], must change continuously and smoothly through time, it is necessary to make use of an interpolating delay line.
At any instant in time, the flanger is equivalent to a feedforward comb filter, which has a frequency response as shown in Fig. 2.
Magnitude response of a feedforward comb filter with M = 5, b0 = 1, and g = bM = 0.1, 0.5, and 0.9.
For g > 0, there are M peaks in the frequency response, centered about the frequencies
. Between these peaks, there are M notches at intervals of fs/M Hz.
As M changes over time, the peaks and notches of the comb response are compressed and expanded. The spectrum of a sound passing through the flanger is thus accentuated and deaccentuated by frequency region in a time-varying manner.
The delay-line length of a flanger is typically modulated by a low-frequency oscillator (LFO). Oscillator waveforms are typically sinusoidal, triangular, or exponential.
For a sinusoidally varied delay,
where f is the flanger ``rate'' in Hz, A is the ``excursion'' (maximum delay swing), and M0 is the average delay-line length that controls the average notch density.
For values of
, the peaks and notches of the comb filter trade places. In practice, g is normally contrained to the interval [0,1] and the option of sign inversion is provided by a ``phase inversion'' switch.
In the inverted mode, a notch is located at zero frequency. As a result, bass response will likely be weakened.
Some flangers also implement feedback (in addition to the delayed feedforward path), which introduces spectral peaks and notches as previously described for feedback comb filters.
A phaser, or phase shifter, is similar to a flanger in that it sweeps notches through the spectrum of an input signal. But while a flanger provides only uniformly spaced notches, a phasor can modulate the frequencies of non-uniformly spaced notches.
A phaser is implemented with allpass filters instead of delay lines, as shown in the block diagram of Fig. 3.
A digital phaser block diagram.
Second-order allpass filters are particularly convenient to use because each can control a separate notch frequency and bandwidth. Second-order allpass filters have a difference equation given by
R is the radius of each pole relative to the z-plane unit circle (R=1), and the pole angles are . The pole angle can be interpreted as
where is the frequency and T is the sampling period.
The phaser will have a notch wherever the phase of the allpass chain is at (180 degrees). This happens very close to the complex-conjugate pole pair angles.
The instantaneous frequency response of a phaser created using 4 second-order allpass filters with notch frequencies set at 300, 800, 1000, and 4000 Hz and R = 0.9, 0.98, 0.8, and 0.9 is shown in Fig. 4.
Instantaneous frequency response of a phaser created with 4 second-order allpass filters and notch frequencies set at 300, 800, 1000, and 4000 Hz.
The depth of the notches can be varied together by changing the feedforward gain parameter g.
To achieve the time-varying ``phasing'' effect, the notch frequencies are modulated with a periodic signal. Note that only a single filter coefficient need be changed in each allpass section to accomplish this.
The Doppler effect occurs when a sound source and listener are moving relative to one another. When the source and listener move toward each other, the sound is perceived to increase in frequency. When the source and listener move away from each other, the sound is heard to decrease in frequency.
The Doppler effect can be used to enhance the realism of simulated moving sound sources.
The amount of frequency shift is given by:
where fs is the frequency of the source at rest, fl is the frequency perceived by the listener, vls is the speed of the listener toward the source (zero if not moving), vsl is the speed of the source toward the listener (zero is not moving), and c is the sound speed.
We can simulate time-varying source and listener velocities using a time-varying digital delay line with separate read and write pointers. The write pointer corresponds to the source signal and the read pointer corresponds to the listener. If the source position is moving toward the listener, the write pointer increment should be changed from 1 to 1 + vsl/c. Likewise, if the listener is moving toward the source, the read pointer increment should be changed from 1 to 1 + vls/c.
Interpolated reads from a delay line (fractional delay lengths) were previously discussed. Interpolated writes are referred to as de-interpolation.
Because Doppler shift is dependent only on the relative motion of a source and listener and because the de-interpolation process is generally more complicated to implement than interpolation, it is best to change only the read pointer increment when possible.
A continuously varying delay can be implemented with a ``growth parameter'' g = -vls/c such that the read pointer is incremented by 1 + g at each time step.
The Doppler effect causes a source signal to appear as though it has been pitch shifted. Pitch shifting of an input signal can thus be implemented with time-varying delay lines as described above.
Because the read pointer of a pitch shifter is incremented at a constant non-integer rate, the read pointer will eventually ``catch up to'' or ``fall back into'' the write pointer location. To avoid discontinuity issues caused when the read and write pointers cross, a multiple read pointer cross-fade system can be used.
Dynamic range compression or expansion involves the modification, typically via a time-varying gain control, of a signal's dynamic range.
The reduction of a signal's dynamic range is referred to as compression, while a range increase is referred to as expansion
Applications of dynamic range reduction/compression include:
increase perceptual loudness;
transmission to a system with lower dynamic range;
reproduction of material with wide dynamic range in noisy environments;
to achieve timbre variations.
Applications of dynamic range expansion include:
noise suppression (noise gate or downward expansion);
restore dynamic level of a previously compressed signal;
add dynamic range for perceptual effect.
If a signal x(t) has a short-term signal level given by
over the period , the dynamic range of the signal is given by
, and is usually expressed in decibels.
Various metrics exist for estimating or evaluating the short-term level of a signal, including peak, average, and average root-power levels.
The short-term peak value of a signal (i.e., the greatest absolute value within the last seconds) is given by:
A signal and its short-term peak and average root-power level estimates.
The short-term average root-power level can be defined as:
where * denotes convolution and u(t) is the unit step function. The value of is thus formed by squaring the signal x(t), performing a ``leaky'' integration, and taking the square root of the result.
A ``leaky'' integrator can be implemented in discrete time as:
where b0 = 1 - a1 for unity gain and |a1| < 1 for filter stability.
Automatic gain control involves the application of a time-varying gain g(t) to a signal, based on an estimate of the signal level.
The compression/expansion gain control is typically specified as a memoryless function that takes a signal level estimate as input and produces a desired gain level as output (levels in dB). An example compression curve is shown in Fig. 6 (the dashed line indicates an input-to-output ratio of 1).
A static compression curve (top: in dB; bottom: on a linear scale.
The dB output level is typically equal to the dB input level up to a threshold level lT (near -50 dB in Fig. 6). Beyond lT, a constant compression ratio is defined by R such that there is an increase of 1/R dB in output for every dB increase of the input. For example, if the compression ratio is 3:1, an input signal that is 9 dB over the threshold will be attenuated to a level 3 dB over the threshold.
The time duration over which the input signal level estimates are calculated has important influence on the response of the compression/expansion system.
It is often desirable that a compressor limit peak amplitudes. This ability requires a relatively small value. However, small values of result in more level variance, which is generally undesirable. For this reason, different integration constants are often used for the attack and release portions of a signal.