Bibliography Listing: Singer Identification

Bibliography: Singer Identification

Berenzweig, A., D. P. W. Ellis, and S. Lawrence. 2002. Using voice segments to improve artist classification of music. In Proceedings of the AES 22nd International Conference on Virtual Synthetic and Entertainment Audio.

Use of neural nets to perform singer identification. Mainly interesting because it quantatively showed that distinguishing between vocal and non vocal parts before performing singer similarity study would increase performances of classification.

Fujihara, H., T. Kitahara, M. Goto, K. Komatani, T. Ogata, and H. G. Okuno. 2005. Singer identification based on accompaniment sound reduction and reliable frame selection. In Proceedings of the International Conference on Music Information Retrieval.

Very clear paper. Interesting in mainly two points. First, a pre-processing stage is added to the vocal/non-vocal detection (done with GMM and a hypothesis test): extraction of the harmonic components present in the sound, based on Goto's PreFEST algorithm for bass line extraction. This harmonic content is then resynthesized by addditive synthesis. Second, an evaluation of different features is done in order to determine the best feature choice. The classification of singers is done using GMMs.

Kim, Y. E., and B. Whitman. 2002. Singer identification in popular music recordings using voice coding features. In Proceedings of the International Conference on Music Information Retrieval.

Main interest resides in the attempt to use a measure of the harmonicity of the signal to separate vocal/non-vocal portions of the sound. Nevertheless, the results obtained for singer classification (using GMM and SVM) are very low, possibly due to the low performances of the vocal/non-vocal segmentation mechanism.

Liu, C. C., and C. S. Huang. 2002. A singer identification technique for content-based classification of MP3 music objects. In Proceedings of the 11th International Conference on Information and Knowledge Management.

An interesting work which is based on phoneme recognition using k-NNs. This is a pretty different approach from that presented in the rest of the papers: no features used for singer classification like pitch, harmonicity etc. and then fed into a GMM. It performed quite well: 80% recognition. It is an illustration of the potential of k-NNs.

Mesaros, A., and J. Astola. 2005. The mel-frequency cepstral coefficients in the context of singer identification. In Proceedings of the International Conference on Music Information Retrieval.

Not directly concerned with the identification process but rather with the feature extraction part. In particular, it explains why and how MFCCs have been used in speaker recognition and singer recognition.

Tsai, W. H., and H. M. Wang. 2004. Automatic detection and tracking of target singer in multi-singer music recordings. In Proceedings of the 2004 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, 221–224.

This article presents a classification method based on a model of the solo voice. It assumes that the instrumental portions of a song and the accompaniement of the singer are very similar. Thus, after discriminating between vocal and non-vocal segments of a song (using GMMs), a model of the voice (GMM) is derived from an a priori model of the accompaniement (GMM). The study of singer tracking (inside a song) is also considered.

Tsai, W. H., and H. M. Wang. 2006. Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals. IEEE Transactions on Audio, Speech and Language Processing 14:330–41.

Very clear text on which I based my work for this presentation. It gives an overview of the past research on singer identification and sets clear directions of the field. It presents the same method as presented in Tsai and Huang (2004). I would recommend this article as a starting point for someone interested in singer identification.

Tsai, W. H., H. M. Wang, D. Rogers, S. S. Cheng, and H. M. Yu. 2003. Blind clustering of popular music recordings based on singer voice characteristics. In Proceedings of the International Conference on Music Information Retrieval.

Earlier work by Tsai et al. that first introduces the idea of deducing a model of the voice from that of the background. See Tsai and Huang (2006) for a more complete presentation.

Zhang, T. 2003. Automatic singer identification. In Proceedings of the 2003 International Conference on Multimedia and Expo, vol. 1, 33–6.

Another example of the use of GMMS for singer classification. A main difference resides in the way the vocal/non-vocal distinction is made: the start of the singer's voice is detected using different features (ZCR, Spectral Flux, Hamonicity measure etc.) and then a fixed length of the song is extracted and studied. In other words, only the first words of the song are studied to perform recognition (using GMMs).