MUMT 621: Music Information Acquisition, Preservation, and Retrieval [course page]
27 March 2012 :: Slide-based Presentation IV
Hannah Robertson [home page]
Self-similarity matrices and dynamic time warping
- Berndt, D. J., and J. Clifford. 1994.Using dynamic time warping to find patterns in time series. AAAI-94 Workshop on Knowldege Discovery in Databases: 359-70.
This paper was one of the first to introduce the dynamic time warping (DTW) method previously used in speech recognition to inherently temporal data series in databases.
- Cooper, M., and J. Foote. 2002. Automatic music summarization via similarity analysis. In Proceedings of the 3rd International Society for Music Information Retrieval Conference: 81-5.
In this paper, a method for using self-similarity matrices to find the most representational thumbnail representations of audio files from various genres is described and implemented, using a column-summing method.
- Dixon, S., and G. Widmer. 2005. MATCH: A music alignment tool chest. In Proceedings of the 6th International Conference on Music Information Retrieval: 492-7.
In this paper, the MATCH toolkit for aligning different audio recordings of the same piece is presented, and the DTW alignment method is discussed.
- Foote, J. 2000. Automatic audio segmentation using a measure of audio novelty. IEEE International Converence on Multimedia and Expo: 452-5.
This paper introduced self-similarity into the world of MIR as a tool for ”automatically locating points of significant change in music or audio, by analyzing local self-similarity." It is discussed as a tool for finding both individual notes and the boundaries between musical sections, such as verse/chorus boundaries and speech/music transitions. After stepping through the creation of a self-similarity matrix, it also discusses how such matrices can be used to determine novelty at each time frame in the audio.
- Foote, J., and M. Cooper. 2001. Visualizing musical structure and rhythm via self-similarity. In Proceedings of the International Conference on Computer Music: 419-22.
This paper reiterates many of the points Foote made in his early introduction of self-similarity matrices to the field of MIR (Foote 2000), and adds information about using self-similarity matrices to visualize musical rhythm, find beats and rhythm structure, and compute a beat spectrogram. A short but really interesting read, especially for those interested in beat finding.
- Goto, M. 2003. A chorus-section detecting method for musical audio signals. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 5: 437-40.
This paper describes a method for finding both the beginning and end of every repeated chorus section ('hook') in popular music audio files, despite key modulations, by investigating self-similarity matrices of chroma vectors.
- Hu, N., R. B. Dannenberg, and G. Tzanetakis. 2003. Polyphonic audio matching and alignment for music retrieval. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics: 185-8.
In this paper, a self-similarity and DTW-based method for matching audio to symbolic music data without first performing transcription is presented.
- Izumitani, T., and K. Kashino. 2008. A robust musical audio search method based on diagonal dynamic programming matching of self-similarity matrices. In Proceedings of the 9th International Society for Music Information Retrieval Conference: Session 5b: 609-13.
This paper applies DTW and self-similarity matrices to the problem of musical audio search with key variations, based on the fact that ”[self-similarity matrices are] insensitive to key variations because the relationship between two time points within a musical audio signal tends to be kept when when the music is played in a different key.” In addition, DTW is built on top of an earlier similarity matrix-based system to improve detection accuracy when significant tempo variations are present. This algorithm was tested, along with two others, on audio files generated from RWC Music Database MIDI files.
- Kelly, C., M. Gainza, D. Dorran, and E. Coyle. 2010. Audio thumbnail generation of Irish traditional music. Irish Signals and Systems Conference.
In this paper, part-similarity matrices are created from chroma vectors found at set accented beats in Irish traditional music, in order to identify optimal thumbnails of each tune. These part-similarity matrices are self-similarity matrices with frame sizes based on 4 bar frame lengths, due to the repetition structure of traditional Irish music.
- Kelly, C., M. Gainza, D. Dorran, and E. Coyle. 2010. Locating tune changes and providing a semantic labelling of sets of Irish traditional tunes. In Proceedings of the 11th International Society for Music Information Retrieval Conference: 129-34.
In this paper, part-similarity matrices as described in (Kelly et al. 2010, above) are created from chroma vectors found at set accented beats in Irish traditional music in order to determine the start of each tune in multi-tune audio file.
- Martin, B., M. Robine, and P. Hanna. 2009. Musical structure retrieval by aligning self-similarity matrices. In Proceedings of the 10th International Society for Music Information Retrieval Conference: 483-8.
In this paper, symbolic user queries are converted into self-similarity matrices and then compared with self-similarity matrices created from a database of pop music audio files to determine degree of similarity and possible matches.
- Müller, M., and M. Clausen. 2007. Transposition-invariant self-similarity matrices. In Proceedings of the 8th International Society for Music Information Retrieval Conference: 47-50.
This paper introduces and demonstrates the power of transposition-invariant self-similarity matrices for the identification of repetitive audio structure irrespective of key change. In contrast to previous works, their algorithm lumps the 12 possible inversions into a single step, rather than calculating each individually.
- Müller, M., and F. Kurth. 2007. Towards structural analysis of audio recordings in the presence of musical variations. EURASIP Journal on Advances in Signal Processing: 1-19.
This paper discusses how to extract repetitive musical segments even when large variations in musical parameters (e.g. tempo, timbre, articulation) are present, by comparing regions of possible similarity after implementing strategically simulated tempo changes. This method makes use of two simultaneous window sizes to incorporate note-based chroma vector information as well as larger-scale tempo, articulation, and note exection information. While the method used isn't refered to as DTW, it follows the same steps.
- Ness, S. 2009. Content-aware visualizations of audio data in diverse contexts. PhD Thesis, University of Victoria.
This is a fascinating MA thesis on visualizing different types of complex audio data in different user-manipulated ways, and is worth a read in general. In terms of DTW and similarity, a brief discussion of similarity matrices and DTW on page 23 involves a couple of illustrative DTW curve overlays on self-similarity matrices.
- Pikrakis, A., S. Theodoridis, and D. Kamaroto. 2003. Recognition of isolated musical patterns using context dependent dynamic time warping. IEEE Transactions on Speech and Audio Processing: 175-83.
Greek traditional music has a different structure compared to much Western equal-tempered music. In this paper, context-dependent DTW is used to better recognize isolated musical patterns within Greek folk music.