Audio Signals

Some of the following figures are used with the permission of Fernando Lopez-Lezcano, CCRMA, Stanford University.

What is Sound?

Sound is a vibration that propagates as an acoustic wave, through a transmission medium such as a gas, liquid or solid.
Humans have a limited range of reception for sound pressure variations of about 20 - 20,000 cycles/second or Hertz (Hz).
Sound travels at about 345 meters/second in air at room temperature. Using the relation $\lambda = c/f$ between frequency (f), wave speed (c) and wavelength ( $\lambda$ ) of waves, a sound at 20 Hz has a wavelength of about 17.25 meters and a sound at 20 kHz has a wavelength of about 17 millimeters.

Sound Signals

Sounds are typically measured via transducers that convert air vibrations into electrical signals.
Electrical sound signals are continuous-time representations of air pressure variations.
These representations can subsequently be stored and/or processed using signal processing techniques.
If/when we analyze continuous-time signals in this course (and MUMT 307), we will make use of the time variable t. For example, a continuous-time sinusoid will be represented as:
$\displaystyle x(t) = A \cos (\omega_{0} t + \phi)$

Ideal Sinusoidal Signals

An ideal sinusoidal signal is represented as $x = A \cos(\omega_{0}t + \phi) = A \cos(2 \pi f_{0} t + \phi)$ , where A is an amplitude scalar, $\omega_{0}$ is the frequency in radians, f₀ is the frequency in Hz, and $\phi$ is phase offset of the sinusoid.

Figure 1: An ideal sinusoidal signal.
$\begin{figure}\begin{center} \epsfig{file=figures/simplesin.eps, width=3.0in} \end{center} \end{figure}$
A sinusoid is the simplest example of a periodic signal.
The period or duration of a single cycle of a periodic waveform is given by the inverse of its frequency T = 1 / f₀.

**Figure 1:** An ideal sinusoidal signal.
$\begin{figure}\begin{center} \epsfig{file=figures/simplesin.eps, width=3.0in} \end{center} \end{figure}$

Sound Sampling

For discrete-time processing and storage (using computers or other microprocessor devices), continuous-time signals must be sampled.
This process is represented mathematically by making the substitution t = n T_s in continuous-time expressions, where T_s is the sampling time interval or period and $-\infty < n < \infty$ (integers). A sampled sinusoid will then have the form:
$\displaystyle x[n] = x(nT_{s}) = A \cos(\omega n T_{s} + \phi) = A \cos(\hat{\omega} n + \phi)$
where $\hat{\omega} = \omega T_{s}$ is the normalized radian frequency.
By Shannon's Sampling Theorem, a continuous-time signal x(t) can be exactly reconstructed from its samples x[n] = x(n T_s) if the samples are taken at a rate f_s = 1 / T_s that is greater than two times the highest frequency component in the signal.
In other words, we must obtain more than two samples per period for all frequency components in a signal in order to accurately represent that signal.
In order to satisfy this condition, signals are typically bandlimited or filtered before they are sampled (and after they are converted back to analog signals).
If the sample rate does not meet the condition outlined above, any frequency components in the signal that are greater than f_s / 2 will “alias” (test Matlab script).
The recent trend toward very high sample rates (96 kHz, 182 kHz) is based more on hardware implementation issues than an attempt to accurately represent frequency components beyond the normal range of the human auditory system.

Quantization

Numbers in a computer can be represented in a variety of different formats (8-bit, 16-bit, 32-bit integers or floating-point numbers).
The choice of a particular number format can have significant influence on the quality of sampled signals.
In general, each bit of precision (in a binary system) provides about 6 dB of dynamic range (a doubling of sound pressure). CD quality recordings use 16-bit integer formats with an approximate dynamic range of 96 dB.
Certain musical sounds may exceed a 96 dB dynamic range, so larger sample sizes might be substantiated (20- and 24-bit formats).
A small amount of noise, called dither, is sometimes injected in a lower-resolution signal to help suppress the audible effects of quantization noise at very low signal levels. The result is to convert audible signal-dependent errors into wide-band noise that is uncorrelated with the signal.

Sound Spectra

By Fourier theory, any waveform can be represented by a summation of a (possibly infinite) number of sinusoids, each with a particular amplitude and phase.
A periodic waveform can be represented by a (possibly infinite) set of harmonically related sinusoids (whose frequencies are related by integer multiples).
Conversely, a sound composed of non-harmonically related sinusoids cannot be periodic.
The Fourier transform of a signal provides a “recipe” for recreating that signal in terms of sinusoidal components.
We can often learn much more about the “content” of a complex waveform by viewing its spectrum or frequency-domain representation instead of (or in addition to) its time-domain representation.
An ideal sinusoidal signal has only a single non-zero Fourier transform value.
An ideal noise signal has a flat spectrum.

Discrete-Time Frequency Analysis

The Discrete-Time Fourier Transform (DFT) can be used to calculate the spectra of discrete-time signals.
The DFT can be efficiently implemented using the Fast Fourier Transform algorithm.
The DFT and its inverse (the IDFT) are lossless transformations ... no data or information is lost in the transformation back and forth between them.