In a previous section, we discussed the use of the STFT to estimate a signal's time-varying frequency response.
With the STFT, a signal is divided into blocks and an FFT is computed for each block.
To improve time resolution, FFT blocks typically overlap one another.
To minimize spectral ``splatter'' associated with signal discontinuities at the block boundaries, window functions are first applied (multiplication in the time-domain).
In Figure 2, a series of concatenated triangular windows are applied to a time-domain signal. Clearly, certain parts of the signal are lost when this happens.
A time-domain signal ``windowed'' with back-to-back triangular windows.
In order to maintain a given signal's properties when performing the STFT, it is necessary that window functions be applied in such a way that they overlap to a constant factor.
For example, if we overlap triangular windows by 50%, they sum to a constant value of one. With an overlap of 75%, they sum to a constant of two.
The Matlab script olaw.m
can be used to test various window overlap percentages.
Triangular, Hanning and Hamming will sum to a constant for overlap percentages of 75% and 50%. Blackman windows sum to a constant for an overlap percentage of 75%. Other overlap percentages that sum to a constant are possible for all these window types.
If the windows do not sum to a constant, an amplitude modulation is imposed on the signal with resulting side-bands spaced by fs / R in the frequency domain (where fs is the sample rate and R is the ``hop'' size in samples).
The Inverse Fast Fourier Transform computes a time-domain signal from its frequency-domain representation.
In general, the result of the IFFT will be a sequence of complex values.
If we want the IFFT result to be a real valued signal, then our frequency domain representation must be Hermetian symmetric.
A Hermetian symmetric spectrum is one in which its real part is even symmetric about the frequency bin k=0 and its imaginary part is odd symmetric about the frequency bin k=0. This can also be expressed as
A real time-domain signal always has a Hermetian frequency-domain transform.
Thus, any manipulation or processing that is performed in the frequency-domain should maintain Hermetian symmetry. In practice, most processing is applied to only half of the transformed signal and symmetry is assumed before the inverse transform is calculated.
If time-domain windows were not overlapped and frequency-domain processing was performed, the resulting IFFT'ed blocks would most likely have discontinuities that would be heard as clicks or pops.
If spectral processing and subsequent resynthesis using the IFFT is planned, time-domain windows must be ``overlap-added'' during reconstruction to avoid clicks at IFFT boundaries.
The Matlab script stfttest.m
demonstrates the STFT and subsequent overlap-add process applied to a test sound.
The ``Vocoder'' was originally developed in the 1930s as a hardware system for speech analysis and resynthesis. The name comes from ``voice encoder''.
From a source-filter model
perspective, the Vocoder is composed of a bank of band-pass (or resonance) filters that model the formants of our vocal mechanism. The settings for these filters are derived from an input speech signal (the ``modulator'').
The input signal to be ``vocoded'' is referred to as the ``carrier''. The carrier signal can be recorded or synthesized. To produce ``speech-like'' results, the carrier is typically an impulse train for vowel sounds and noise for fricative sounds.
As diagrammed in Fig. 3, the vocoder uses two banks of bandpass filters. The first bank estimates frequency parameters from a given modulator input, while the second bank is used to process the carrier input. The derived modulator bandpass settings control the gains applied at the ouput of the carrier bandpass filters.
Early and ``classic'' vocoders use a fixed number of bandpass filters (usually 8 - 16) with fixed center frequencies.
The Max/MSP ``Classic Vocoder'' shown below (found in the examples/effects/ directory of the Max/MSP distribution) is implemented in this way.
Classic vocoders have a ``mechanical'' quality to them that has been exploited for musical purposes by a broad range of musicians and composers.
Modern vocoders intended for use in ``high-quality'' communications (cell phones, ...) typically use linear prediction techniques to estimate the carrier (and residual) parameters. In this case, the center frequencies and number of band-pass filters used are not fixed.