MUMT 611

[ Presentation ] [ Bibliography ] [ Summary ]

Annotated Bibliography with Hyperlinks on Audio-based Music Similarity Analysis
Beinan Li
Created 10 March 2005

  • Foote, J. 1997. Content-based retrieval of music and audio. Multimedia Storage and Archiving Systems II, Proceeding of SPIE 3229: 138–47.

            In this paper, Foote demonstrated an audio content-based retrieval technique that is purely data-driven and is not dependent on particular audio characteristics. The metaphor used for audio similarity is first to find a sample-specific template, which is a histogram of MFCC feature vectors partitioned from an primitive feature space, and then to compare the templates from different audio samples to calculate the distance between them as the similarity measure. A supervised tree-based vector quantizer is used for generation of the histogram. Different measures such as Euclidean and cosine distance measures are used. Experiments on simple sounds and music clips are performed and compared to the Muscle Fish system (Foote 1999), showing the effectiveness of the approach and its advantage on music-clips-based tasks.
  • Foote, J. 1999. An overview of audio information retrieval. Multimedia Systems 7(1): 2–11.

            In this paper, different techniques of audio information retrieval concerning speech recognition and music information retrieval are introduced.
  • Foote, J., M. Cooper, and U. Nam. 2002. Audio retrieval by rhythmic similarity. Proceedings of the 3rd International Symposium on Musical Information Retrieval: 265–66.
            In this paper, a technique on audio retrieval by rhythmic similarity is introduced. This unsupervised technique is not restricted to certain musical genres or music with certain rhythmic features. The metaphor used for similarity is a Beat Spectrum based on the idea that the autocorrelation of spectral-related audio features, within a certain time range, can hint rhythmic information. A novel technique called Similarity Matrix is used to derive the Beat Spectrum and visualize the audio structure. Experiments of this technique has been made on both different-tempo versions of same music and small group of different music excerpts with different distance measure, showing cosine distance and a new measure called Fourier Beat Spectral Coefficients generated excellent precision of 96.7%. An automatic play-list generation application using this technique is introduced.
  • Logan, B., and A. Saloman. 2001. A music similarity function based on signal analysis. IEEE International Conference on Multimedia and Expo.
            In this paper , an unsupervised technique based on the idea of Foote’s histogram-like approach is introduced. Instead of trained tree-based quantization, which may risk too much emphasis on several local bins, a statistical clustering is used to form higher-level feature from MFCC-based low-level audio features. A probability-based distance measure Earth Mover’s Distance is used. Evaluations on both objective and subjective relevance are performed, which demonstrated the effectiveness of the technique. The experiment on the robustness of the technique to data corruption is also given, showing its excellent anti-corruption ability. A play-list system based on the technique is introduced.
  • Demo: Music Retrieval by Content. (accessed 10 March 2005).
            A web demo query system built on top of the technique in Foote 1997 is located at this web page.


Back to week 8