MUMT 621: Music Information Acquisition, Preservation, and Retrieval [course page]
13 March 2012 :: Slide-based Presentation III
Hannah Robertson [home page]
- Ellis, D. and G. Poliner. 2006. Classification-based melody transcription. Machine Learning 65:439-56. doi:10.1007/s10994-006-8373-9
This is an informative paper on how to apply clustering techniques and HMM in order to classify polyphonic audio and MIDI into melody lines.
- Isikhan, C. and G. Ozcan. 2008. A survey of melody extraction techniques for music information retrieval. In Proceedings of the Fourth Conference on Interdisciplinary Musicology (CIM08).
This paper introduces melody extraction in general and summarizes the state of (primarily symbolic) melody extraction as of 2008. It also implements and compares several methods. Despite the stilted and sometimes vague use of English, this article does a good job of explaining several variations on the skyline method of melody extraction. Results from testing 6 of these skyline methods on 23 MIDI files are presented, and the results are made available in an appendix. The bibliography is particularly useful for finding additional literature on symbolic melody extraction.
- Joo, S., S. Park, S. Jo, and C. Yoo. 2011. Melody extraction based on harmonic coded structure. In Proceedings of the 12th International Society for Music Information Retrieval Conference 227-232.
In this paper, melody is extracted from polyphonic audio using harmonic coded structure-based pitch-candidate estimation and musically-informed pitch-sequence identification. The method was designed to combat the polyphonic extraction problems of accompaniment interference and octave mismatch. After identifying possible melodies by considering relationships among pitch candidates, a best melody is chosen based on interval, vibrato, and note duration rules. The algorithms are reported as performing either as well as or better than the other algorithms submitted to MIREX 2009.
- Marolt, M. 2004. Gaussian mixture models for extraction of melodic lines from audio recordings. In Proceedings of the 5th International Conference on Music Information Retrieval.
In this paper, Gaussian mixture models are twice applied in order to find and then group melodic lines from audio recordings. First, EM is used to find all melodic fragments in a recording, by looking for regions with strong and stable pitch. The dominance, pitch, loudness, pitch stability, and onset steepness of these fragments is then used to cluster them by source: vocal, backup-vocal, bass, noise, etc. Presented results show success at grouping the lead melodic line, but less so the lesser lines (including noise).
- Paiva, R., T. Mendes, and A. Cardoso. 2004. An auditory model based approach for melody detection in polyphonic musical recordings. In Lecture Notes in Computer Science - Computer Music Modeling and Retrieval: Second International Symposium 21-40.
This paper focuses on an auditory scene analysis method for identifying lines in polyphonic audio.
- Paiva, R. 2007. An approach for melody extraction from polyphonic audio: Using perceptual principles and melodic smoothness. The Journal of the Acoustical Society of America 122:2962-9.
This paper extensively focuses on note extraction from polyphonic recordings, before categorizing these notes into melodies.
- Poliner, G., D. Ellis, A. Ehmann, E. Gomez, S. Streich, and B. Ong. 2007. Melody transcription from music audio: Approaches and evaluation. IEEE Transactions on audio, speech and language processing 15.
This paper provides a good literature review for melody extraction from audio. It points out that melody is a "musicological concept based on the judgment of human listeners," and therefore there is not necessarily one unique melody within each recording. It discusses why melody extraction is interesting, summarizes the strategies of the MIREX submissions in 2004 and 2005, and discusses the evaluation data set and methodology for these MIREX competitions. It finishes with a discussion of which aspects of each extraction system work well for the test data. This paper is a particularly good, comprehensive review.
- Rao, V. and P. Rao. 2008. Vocal melody detection in the presence of pitched accompaniment using harmonic matching methods. Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-08).
This paper focuses on audio F0 estimation using harmonic matching, which is designed for and works especially well on vocal music. It is a very comprehensive article that includes some good graphs showing vocal harmonics.
- Rao, V. and P. Rao. 2008. Melody extraction using harmonic matching. MIREX Audio Melody Extraction Contest Abstracts.
This brief paper describes how the above Rao and Rao (2008) algorithm is implemented in the context of the MIREX competition.
- Rao, V. and P. Rao. 2009. Improving polyphonic melody extraction by dynamic programming based dual F0 tracking. In Proceedings of the 12th International Conference on Digital Audio Effects.
This paper discusses an algorithm that attempts to deal with tracking multiple F0 paths when the pitched accompaniment is at a comparable strength to the main singing voice. It discusses an optimal path finding method through a F0 candidate space. This algorithm is applied to three categories of music: those with one pitched sound always present, those where no pitched sound may be present, and those in which F0 collisions/crossovers might occur.
- Ryynänen, M. and A. Klapuri. 2008. Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal 32:72-86.
This paper provides a comprehensive description of many different aspects of polyphonic transcription, including melody and chord extraction. The presented melody extraction method makes use of HMMs.
- Salamon, J. and E. Gómez. 2009. A chroma-based salience function for melody and bas line estimation from music audio signals. Sound and Music Computing 331-6.
Despite the misleading title, this paper actually focuses on chroma-based polyphonic pitch detection rather than melody extraction. I'm including this paper in this bibliography because the title makes it seem so pertinent to this topic that perhaps others will appreciate the warning before they get too excited! In terms of melody and bass line selection, the authors say, "we identify two important steps that would have to be added to our approach to give a complete system" and "voicing detection should be applied to determine when the melody is present." While the paper presents an extremely successful method for determining the pitch class of notes in the melody, it doesn't go so far as to actually pull out the melody F0s, and so is more related to the polyphonic transcription aspect of the melody extraction task than the melody extraction itself.