Rhythmic Transcription of MIDI Signals

Powerpoint Presentation

Written Summary (pdf)

Annotated Bibliography

All hyper links were accessed on February 17th, 2005.

Allen, P. and R. Dannenberg. 1990. Tracking musical beats in real time. In Proceedings of the International Computer Music Conference 1990: 140–3.

This paper lists improvements on the beat tracking method listed in an ICMC 1987 paper by Dannenberg and Bernard Mont-Reynaud. Specifically, they use a beam search through possible rhythmic interpretations to find the best one for the incoming performance information. Interpretations include estimated period and phase variables. Unfortunately, the paper does not mention whether the information is in MIDI form, nor does it list experimental results. It is notable for its early date and also for Dannenberg's involvement, as he was an early proponent of beat tracking systems.

Cemgil, A., B. Kappen, P. Desain, and H. Honing. 2000. On tempo tracking: Tempogram representation and Kalman filtering. Journal of New Music Research 29: 259–73.

This paper is important and useful because of several aspects. Desain and Honing are known for their "connectionist" approach to rhythm interpretation, outlined in a Computer Music Journal paper in the 80s. Curiously, the paper mentions that there has never been an experimental implementation of this model, including the described system. This paper also contains a thorough account of a stochastic implementation of a tempo tracker. Tempo is modelled as a state-based dynamical system. A Kalman filter applied to a wavelet called a tempogram estimates tempo, which is a hidden variable of the system. The paper is also notable for using a training/test set of 108 piano performances in MIDI form of the Beatles songs Yesterday and Michelle, played by 4 jazz professionals, 4 classical performers, and 4 amateurs. The system has a success rate in upwards of 90% at capturing the tempo of the test performances. The performances are also available on the internet for testing purposes (see below).

A draft version is available online at staff.science.uva.nl/~cemgil/papers/cemgil-tt.pdf.

Dixon, S. 2001. Automatic extraction of tempo and beat from expressive performances. Journal of New Music Research 30 (1): 39–58.

This paper outlines an offline system for extracting tempo and beat from audio or MIDI signals. It begins with a thorough listing of background work in rhythmic perception models, previous work in beat and tempo tracking for audio and MIDI signals, and issues in tempo and beat tracking. Dixon's system has a tempo induction part and a beat tracking part. Interestingly, Dixon's system ignores signal amplitude onsets that are within 50ms of larger onsets, as these larger onsets are more salient in terms of establishing rhythm. He notes that while this is sufficient for tempo tracking, it is not appropriate for transcriptions systems which must try to capture all note onsets. Note onsets close to each other are clustered and then the difference between consecutive clusters are used to induce tempo. This creates a series of tempo hypotheses from which the beat tracking module chooses the best one for the given beats. Dixon also considers musical salience as a factor, looking at pitch and dynamic factors within the music. The system is tested using a variety of Western popular and classical audio and MIDI examples and performs above 70% for the worst case of beat tracking.

A draft version is available online at www.ai.univie.ac.at/~simon/pub/2001/jnmr.pdf.

Meudic, B. 2002. A causal algorithm for beat-tracking. In Proceedings of Conference on Understanding and Creating Music.

This paper outlines a real time implementation of Dixon's algorithm using markers, which are the weighting of the elements of a sequence according to a certain property. In line with Dixon's idea of musical salience, beats are marked with respect to four properties calculated from MIDI signals: short and long inter-onset intervals, pitches for note densities on a given beat, and duration of notes. A list of possible beats (hypotheses) is maintained and updated with each new beat to help determine period. Phase values are also determined. At each beat, the 'best beat' is determined from the list. While categorized as a real time algorithm due to the causal nature of the algorithm, it seems quite computationally intensive, and in effect, no experimental results are given. The paper ends on a discussion of evaluation criteria, briefly touching upon the phenomenon of human rhythmic styles or 'feels' as evidence against using a score in evaluating

Available online at recherche.ircam.fr/equipes/ repmus/meudic/beat-tracking.pdf

Pardo, B. 2004. Tempo tracking with a single oscillator. In Proceedings of the International Conference on Music Information Retrieval 2004.

This paper describes an oscillator-based approach to tempo tracking rather than a probabilistic approach. Tempo is estimated through heuristics rather than stochastical approximations, though still in terms of a period and phase. Furthermore, the oscillatory designed is compared to Cemgil et al.'s by using the same corpus and evaluation metrics. The results show the oscillator design to be within a single standard deviation of the Cemgil design.

Available online at www.iua.upf.es/mtg/ismir2004/review/CRFILES/paper206-c4befce599e3eda209e6595065e02cf6.pdf.

Raphael, C. 2002. A hybrid graphical model for rhythmic parsing. Artificial Intelligence 137: 217–38.

Raphael also uses a probabilistic approach to rhythm and tempo calculation. This system differs from Cemgil's approach in that it calculates rhythms and tempo simultaneously by using a globally optimal solution for the proposed graphical model, as opposed to the tempo approximation output by the Kalman filter. Raphael uses a quantisation scheme that is derived from the scores of the test pieces. The divisions are based on the measure of the piece. The system is trained and tested on Western classical music MIDI performances.

Available online at xavier.informatics.indiana.edu/~craphael/papers/ai_rhythmic_parsing.pdf

Takeda, H., T. Nishimoto, and S. Sagayama. 2002. Automatic rhythm transcription from multiphonic MIDI signals. In Proceedings of the International Conference on Music Information Retrieval 2003.

This paper outlines a probabilistic approach to rhythm transcription. Hidden Markov Models trained on existing scores and performances "estimate rhythm from the IOIs (inter-onset intervals) of the given MIDI using tempo invariant feature paramaters". Similar to Raphael, estimated tempo distributions are modelled as Gaussian mixtures. Similar to Cemgil, an Expectation-Maximization algorithm is used to estimate the true tempo.

Available online at ismir2003.ismir.net/papers/Takeda.PDF.

Testing Data

Cemgil - www.nici.kun.nl/mmm/archives/index.php?archive=beatles

- 12 pianists: 4 jazz professionals, 4 classical professionals, and 4 amateurs

- arrangements of 2 Beatles songs: Yesterday, Michelle

- 3 fast versions, 3 slow versions, 3 normal versions, per song, per musician for a total of 108 performances