A classic vocoder, as previously discussed, uses bandpass filters of fixed frequency for both its analysis and resynthesis phases. During the analysis phase of the system, only a channel gain is computed (and phase information is ignored).
The DFT can also be viewed as an channel filterbank with fixed center frequencies (the
values).
However, for each frequency component, the DFT computes both a gain and a phase. The phase data can provide information about the deviation of a sinusoidal component from its channel center frequency.
For example, consider the sinusoidal signal
(6)
The argument of the cosine function is a constantly changing value in the range 0 to (or to ). If the frequency is fixed, the rate of change (or the derivative), is constant. If the frequency is changing, this will be reflected in the time derivative of the phase.
In other words, the instantaneous frequency of the signal is directly proportional to the derivative of its phase argument.
The “phase vocoder” takes advantage of this fact to allow natural sounding frequency-domain manipulations, such as time-stretching/compression and pitch shifting, to be performed.
In order to compute the instantaneous frequency, the complex FFT bin data must be converted to a polar representation to get a phase value. This is accomplished in Max/MSP with the cartopol˜ object.
The phase then should be “unwrapped”. That is, the phase is typically constrained to the range to . For these calculations, it is necessary to compute the actual phase without such constraints.
The phase is then “differentiated” using a differencing operation:
(7)
where is a frame index, is the frame overlap in samples, and is the sample period.
Finally, the center frequency for a given bin should be added to this result to get the actual channel frequency.
A simple phase vocoder implementation is provided in MSP Tutorial #26.