Dynamic Programming

Annotated Bibliography

Adams, N., M. Bartsch, J. Shifrin, and G.Wakefield. 2004. Time series alignment for music information retrieval. In Proceedings of the International Conference on Music Information Retrieval: 303–10.

This paper uses Dynamic Time Warping (DTW) to evaluate the usefulness of three time series representations for humming queries, namely by note, by contour, and by pitch histogram. Note representation was found to allow for the fastest retrievals, contour proved most robust, and histograms displayed a mix of both properties.

A copy of this article can be found here (March 31st, 2005).

Bellman, R. 1957. Dynamic Programming. Princeton: Princeton University Press.

This is the introductory text by the inventor of dynamic programming. The text describes the mathematical foundations of this technique of dealing with multi-stage decision processes. Included is an analysis of both deterministic and stochastic processes, and how dynamic programming can deal with them.

Chang, C. 1972. Dynamic programming as applied to feature subset selection in a pattern recognition system. In Proceedings of the ACM annual conference 1: 94–103.

This paper proposes two dynamic programming procedures for selecting a subset of optimal features. The author provides the mathematical specifics of each procedure, as well as comparisons with exhaustive search methods, and step-by-step examples of each.

Guo, A., and H. Siegelman. 2004. Time-warped longest common subsequence algorithm for music retrieval. In Proceedings of the International Conference on Music Information Retrieval: 258–61.

This ariticle describes a new algorithm for database queries by humming. The authors note that a humming query might be faster or slower than the original tune. The Dynamic Time Warp (DTW) algorithm addresses the expansion/contraction aspect of the search. The DTW matches the start and end points of the query to the song start and end points. However, a humming query might only contain a portion of the song, in which might occur further pitch distortions, deletions, and insertions. The authors propose a string pattern matching algorithm, namely the Longest Common Subsequence, to provide a stronger similarity measure when using partially correct queries. The paper describes the combination of these algorithms on humming queries for a database of monophonic folk songs.

A copy of this paper can be found here. (March 31st 2005)

Heo, S., M. Suzuki, A. Ito, and S. Makino. 2003. Three dimensional continuous DP algorithm for multiple pitch candidates in music information retrieval system. In Proceedings of the International Conference on Music Information Retrieval.

This short paper outlines the use of Paulus and Klapuri's (2002) DTW framework as used for humming queries. The authors modify it such that multiple pitch candidates are considered, along with a duration and confidence measure.

A copy of this paper can be found here. (March 31st 2005)

Nishimura, T., H. Hashiguchi, J. Takita, J. Zhang, M. Goto, and R. Oka. 2001. Music signal spotting retrieval by a humming query using start frame feature dependent continuous dynamic programming. In Proceedings of the International Conference on Music Information Retrieval.

This paper outlines the changes to a continuous dynamic programming algorithm first developed by the authors. The new algorithm reduces the dimensionality of the search space. The domain is query-by-humming song retrieval. The results show the older algorithm to be more robust than the new algorithm, although it is much less efficient.

A copy of this paper is available here. (March 31st 2005)

Paulus, J., and A. Klapuri. 2002. Measuring the similarity of rhythmic patterns. In Proceedings of the International Conference on Music Information Retrieval.

This paper outlines the use of the DTW algorithm for measuring rhythmic similarity. The authors use a set of feature vectors from one segmented pattern against the set of another. The feature vector includes the natural log of the mean square energy, as well as the spectral centroid, of adjacent 50% overlapping audio frames.

A copy of this paper is available here. (March 31st 2005)

Rabiner, L., and B-H. Huang. 1993. Fundamentals of Speech Recognition. Prentice Hall: New Jersey.

This seminal text on speech recognition contains an explanantion and mathematical implementation the use of DTW in the time-alignment of spoken utterances. It outlines use of endpoint, monotonicity, local and global contraints. This form of the DTW is used extensively in MIR.

Raphael, C. 2002. A hybrid graphical model for rhythmic parsing. Artificial Intelligence 137: 217–38.

The author of this paper uses dynamic programming to allow for the efficient computation of a Gaussian mixture model of tempo and rhythm.

Dynamic Programming

Powerpoint Presentation

Written Summary

Annotated Bibliography