Assignment 2 - Audio Compression Techniques


E Ambikairajah, AG Davis, WTK Wong. 1997. Auditory Masking and MPEG-1 Audio Compression. Electronics & Communication Engineering Journal, 9, (4): 165-175.
This paper describes the the psychoacoustic phenomenon of auditory masking and its use in in audio coders (by allowing quantisation noise to be allocated in the various frequency subbands according to a masking function). The paper also presents a review of the MPEG-1 international standard for audio compression, and includes a short description of the psychoacoustic models that it uses.

Brandenburg, K. 1999. MP3 and AAC Explained. AES 17thInternational Conference on High Quality Audio Coding.
This paper (written by a researcher from Fraunhover Institute), is aimed at providing explanations for different MPEG standards. In particular, the structure, principles and features of MPEG-1 III and AAC are described in a lot of detail. However, it also provides general information about MPEG-4 and MPEG-7 formats. The paper talks about the factors determining the quality of compressed audio and some of the techniques that can potentially be misused in MPEG encoding and decoding, which negatively affects the audio quality.

Brandenburg, K. and G. Stoll. 1994. ISO-MPEG-1 Audio: A Generic Standard for Coding of High-Quality Digital Audio. Journal of the Audio Engineering Society, 42, (10): 780-792.
This paper (written by a researcher from Fraunhover Institute), is a general overview of the MPEG-1 standard. It provides a technical description of the standard in general, as well as more specific information for compression techniques used in its different layers.

Dietz, M., Popp, H., Brandenburg, K., and R. Friedrich. 1996. Audio Compression for Network Transmission. Journal of the Audio Engineering Society, 44, (1-2): 58-72.
This article describes efficient audio compression techniques relevant to transmission of high-quality audio signals over the Internet. In particular, it describes MPEG- 2 Layer 3 audio compression standard, the different implementations of its encoders and decoders, and the use of those implementations for network transmission.

Gersho, A. 1994. Advances in speech and audio compression. Proceedings of the IEEE, 82, (6): 900-918.
This article describes some of the advances in common techniques used in speech and audio compression, based on emerging techniques in digital technology and their implementation in diverse commercial applications. The paper describes the LPC (Linear Predictive Coding) techniques and algorithms used in speech and wideband audio compression. It also provides a description of common subband and transform coding methods, used in combination with perceptual coding techniques, to achieve indistinguishable reconstruction of audio quality at bit rates of 128 kbps per channel.

Hans, M., and R. Schafer. 2001. Lossless Compression of Digital Audio. IEEE Signal Processing Magazine, 18, (4): 21-32.
This article deals with lossless audio compression, which is implemented in several audio file formats used for music distribution over the Internet, DVD audio, digital audio archiving, and mixing. The paper presents a survey and classification of lossless audio compression algorithms. One of the important points mentioned in the study is that, according to the paper, lossless audio coders have reached a limit in what can be achieved for lossless compression of audio. The paper also describes AudioPak, a lossless audio coder with low algorithmic complexity and high performance characteristics compared to other lossless audio coders.

Jayant, N., J. Johnston and R. Safranek. 1993. Signal Compression Based on Models of Human Perception. Proceedings of the IEEE, 81, (10): 1385-1421.
This is a theoretical paper that describes the notion of perceptual coding, and talks about the progress up to date (in 1993) in the field of audio compression, as a result of advances in classical coding theory, modeling of human perception, and digital signal processing. The perceptual coding techniques that are mentioned in the article are the ones that are used in the MPEG-1 encoding standard.

Johnston, J. 1988. Transform Coding of Audio Signals Using Perceptual Noise Criteria. IEEE Journal on Selected Areas in Communications, 6, 314-323.
This article is interesting from a historical point of view, as it predates some of the popular audio compression encoders (such as MPEG-1 Layer III standard). The paper describes an audio encoder, that is designed using a psychoacoustically derived noise-masking threshold, and tested with a set of mono audio sounds sampled at 32 kHz. The work suggests that indistinguishable reconstruction is achieved for those sounds at 96 kbps.

Leslie, B., and M. Sandler. 1998. Audio Compression Using Wavelets. IEE Colloquium on Audio and Music Technology: The Challenge of Creative DSP.
This paper describes some of the work done on audio compression using wavelets, as an alternative to using polyphase filter banks for frequency band separation. It also provides a short overview of the MPEG-audio coding standard that the described wavelet encoder is using. The paper also describes the wavelets and their use in audio compression from a more general point of view.

Noll, P. 1997. MPEG Digital Audio Coding. IEEE Signal Processing Magazine, 14, (5): 59-81.
The paper provides a detailed description of the main technologies and features of MPEG-1 and MPEG-2 audio coders, concentrating on description of advances in MPEG-2 and details on compatibility between the two standards. As part of the MPEG-2 overview, IMPEG-2 Advanced Audio Coding (AAC) layer is presented. The article also presents the MPEG-4 standard and talks about some of the typical applications for MPEG audio compression.

Painter, T. and A.S. Spanias. 2000. Perceptual Coding of Digital Audio. Proceedings of the IEEE, 88, (4): 451-513.
This is an extensive overview of the use of perceptual techniques in digital audio. The paper starts with a description of psychoacoustic principles, while concentrating on providing a detailed overview of the MPEG psychoacoustic signal analysis model. It also talks about filter bank designs, and describes the modified discrete cosine transform, a filter bank that has become extremely popular in perceptual audio coding. Next, additional lossless audio encoding techniques are described in detail. The paper also provides extensive overviews of the ISO/IEC MPEG family (-1, -2, -4), the Lucent Technologies PAC/EPAC/MPAC, the Dolby AC-2/AC-3, and the Sony ATRAC/SDDS algorithms. Subjective evaluation methodologies for audio quality are also mentioned in the paper.

Pan, D. 1995. A Tutorial on MPEG/Audio Compression. IEEE Multimedia, 2 (2): 60-74.
This tutorial is aimed at describing the theoretical principles behind MPEG audio compression, concentrating on how the lossy-type algorithm can achieve indistinguishable reconstruction of signal quality on the basis of using the perceptual properties of the human auditory system. The article also describes the generic principles of psychoacoustic modeling and additional audio compression techniques.

Pan, D. 1993. Digital Audio Compression. Digital Technical Journal 5, (2): 28-40.
This paper provides a general description of the basic audio signal compression process and talks about some of the most popular audio compression techniques and algorithms. It also provides a good basic description of the capabilities of and theoretical principles behind different layers of the MPEG-1 audio compression standard.

Robinson, D., and M. Hawksford. 2000. Psychoacoustic models and non-linear human hearing. 109th AES Convention, preprint 5228.
This article talks about the distortion effect, which is not present in the original audio signal but is produced by the human ear (through inter-modulation of a spectral complex), as a result of the non-linearity property of human hearing. The paper suggests that when psychoacoustic codecs remove masked components from an audio signal, they also remove the in-ear-generated distortion, and so the listening experience is modified. The paper suggests a method for quantifying, predicting and preserving the in-ear distortion in the audio signal.

Todd, C., G. Davidson, M. Davis, L. Fielder, B. Link, and S. Vernon, 1994. AC-3: Flexible Perceptual Coding for Audio Transmission and Storage. Presented at the 96th Convention of the Audio Engineering Society, Preprint 3796.
This paper describes Dolby AC-3, a flexible audio data compression technology that allows encoding a range of audio channel formats (from monophonic to 5.1) into a low rate bit stream. A complete overview of AC-3 technology and compression techniques is presented. Some of the techniques and features used by AC-3, which are described in the paper, are transmission of a variable frequency resolution spectral envelope and hybrid backward/forward adaptive bit allocation.