Blackboard Systems for Polyphonic Music Transcription Annotated Bibliography

Blackboard Algorithms for Polyphonic Music Transcription
Annotated Bibliography with Hyperlinks
Rebecca Fiebrink
Created 13 February 2005

Bello, J. 2003. Towards the automated analysis of simple polyphonic music: A knowledge-based approach. PhD dissertation, University of London.

Bello's thesis builds on previous systems by Bello et al., Martin, Kashino, and others. The system here employs a blackboard architecture, but it does not include the neural network chord recognizer or the fuzzy inference system seen in earlier papers. Like Martin's system, it employs global scheduling; like Kashino's, it uses temporal processing in addition to bottom-up and top-down. Information regarding single instrument waveforms is used as an alternative to strict frequency domain analysis.

This work, being a thesis, is not recommended in its entirety for casual reading. However, it seems to be well enough organized so that one can find specific information on the implementation. As a bonus, Section 2.2 provides a great overview of existing blackboard systems, and it might suggest additional reading useful to blackboard newcomers.

Available online http://www.elec.qmul.ac.uk/research/thesis/Bello2003-thesis.pdf (Accessed 16 February 2005)

Bello, J., G. Monti, and M. Sandler. 2000. Techniques for automatic music transcription. Proceedings of the international symposium on music information retrieval.

This is Bello, Monti, and Sandler's first paper to present a blackboard polyphonic transcription system. (It is described in the second half of the paper.) Their system is based on Martin's 1996 work. A significant variation from Martin is the introduction of a neural network-based chord recognizer. This new knowledge source identifies the probable existence of a chord in the input (as opposed to a single note). The data hierarchy is simplified from Martin's work. The system transcribes piano music with three or fewer consecutive notes, and it, like Martin's project, encounters problems identifying octave intervals. There is no quantitative analysis of the system performance. This is a reasonably straightforward and readable work.

Available online http://ciir.cs.umass.edu/music2000/papers/bello_paper.pdf (Accessed 16 February 2005)

Bello, J., and M. Sandler. 2000. Blackboard system and top-down processing for the transcription of simple polyphonic music. Proceedings of the COST G-6 conference on digital audio effects.

This paper describes the same system as Bello, Monti, and Sandler, 2000, and most of the content is the same as the first. However, new diagrams are present, and the "Conclusions and Next Steps" section has been better fleshed-out.

Available online http://www.elec.qmul.ac.uk/staffinfo/juan/documents/BelloDAFx00.pdf (Accessed 16 February 2005)

Ellis, D. 1996. Prediction-driven computational auditory scene analysis. PhD dissertation, Massachusetts Institute of Technology.

This thesis is the culmination of Ellis' work in auditory scene analysis at MIT. Previous systems by Ellis employed perceptual knowledge in computational auditory scene analysis. This system explicitly incorporates a blackboard architecture. Its goal is to distinguish among ambient sounds, rather than to transcribe polyphonic music, but I have included it here because Ellis' work influenced other researchers such as Martin and Bello.

Available online http://web.media.mit.edu/~dpwe/pdcasa/pdcasa.pdf (Accessed 16 February 2005)

Godsmark, D., and G. Brown. 1999. A blackboard architecture for computational auditory scene analysis. Speech Communication 27:351–366.

Using perceptually-informed preprocessing, Godsmark and Brown's blackboard system performs well on polyphonic, multi-timbral transcription tasks. A focus is on hierarchical grouping principles in perceptual sound organization. While the system is evaluated with musical signals, it approaches the problem from the perspective of auditory scene analysis in general.

This is a long and dense paper, and I recommend reading it after Martin's and Kashino's.

Available online http://www.dcs.shef.ac.uk/~guy/pdf/spcom99.pdf (Accessed 16 February 2005)

Hainsworth, S. 2001. Analysis of musical audio for polyphonic transcription. First year Ph.D. report, University of Cambridge.

This handy report briefly outlines the history of polyphonic transcription up to 2001. (It does not present original work.) It makes a useful reference, and might suggest further reading on any aspect of transcription. I especially like that there are often a few words regarding how one researcher's work relates to others'.

Available online http://www-sigproc.eng.cam.ac.uk/~swh21/ (Accessed 16 February 2005)

Kashino, K., K. Nakadai, T. Kinoshita, and H. Tanaka. 1995a. Application of Bayesian probability network to music scene analysis. Proceedings of the computational auditory scene analysis workshop, international joint conferences on artificial intelligence, 21–26.

This paper describes the same OPTIMA system as Kashino et al. 1995b. Much of this paper is a word for word duplication of the other, so for the purposes of describing system implementation it serves as a complementary resource to someone wishing to understand the system details. Importantly, this paper supplies a broader range of evaluation results, including octave and fifth relations in the input. It is apparent that note recognition is significantly hampered by the presence of these intervals, a result not clear from the 1995b paper.

———. 1995b. Organization of hierarchical perceptual sounds: Music scene analysis with autonomous processing modules and a quantitative information integration mechanism. Proceedings of the computational auditory scene analysis workshop, international joint conferences on artificial intelligence, 158–64.

This paper does not explicitly involve blackboard algorithms or polyphonic transcription. However, the goal is the construction of a music scene analysis system that involves the recognition of rhythm, chords, and source-separated musical notes. In effect, the output is a transcription of multiple-instrument, multiple-voice music. The computational system involved (called "OPTIMA" by the authors) resembles Martin's blackboard system in that it employs perceptually-informed top-down and bottom-up knowledge sources. The method differs from Martin's blackboard system in that, rather than using a global control method, it uses a Bayesian probability network to propagate information about three perceptual layers of hypotheses throughout the system. The system performed between 30-100% correctly depending on the evaluation method, but the evaluation details are on the sparse side. It is notable that test inputs were synthesized from MIDI data, not real acoustic performances.

The work of Kashino (see 1995a referenced on this page, as well) was done concurrently with that of Martin, and many similarities exist between the two systems, but the two groups seem to have worked independently. Kashino's papers explicitly tie the representation of sounds employed in the computational system to a hierarchical model of sound perception derived from Bregman. Kashino also addresses sound source (i.e. instrument) identification. The details of this paper are difficult to understand for someone not familiar with Bayesian networks. However, it's possible to gain a basic appreciation for Kashino's approach, and a better idea of how blackboards and related systems work.

Klapuri, A. 2001. Means of integrating audio content analysis algorithms. Proceedings of the 110th convention of the audio engineering society.

Klapuri demonstrates the flexibility of the blackboard architecture in incorporating existing audio content and analysis algorithms. Polyphonic transcription is outlined as a system goal, but details on system performance are not present. This discussion-oriented paper is most useful in highlighting the advantages of the blackboard architecture in general applications.

Available online http://www.cs.tut.fi/sgn/arg/music/aes2001_klapuri.pdf (Accessed 16 February 2005)

Martin, K. 1996a. A blackboard system for automatic transcription of simple polyphonic music. MIT Media Laboratory Perceptual Computing Section Technical Report No. 385.

This paper offers an indispensable introduction to how a blackboard system might function in polyphonic music transcription. Martin's project seems to be the first working blackboard polyphonic transcription system. The project has a limited scope (piano renditions of Bach chorales) and outstanding problems (notably octave errors), and it relies on an artificial (not perceptually-informed) hierarchy of sound objects. However, the explanation of the system implementation is quite clear, and a lengthy example transcription is very helpful to someone wishing to understand how blackboard systems work. No quantitative evaluation of system performance is offered, but shortcomings are discussed.

Available online http://xenia.media.mit.edu/~kdm/research/papers/kdm-TR385.pdf (Accessed 16 February 2005)

———. 1996b. Automatic transcription of simple polyphonic music: Robust front end processing. MIT Media Laboratory Perceptual Computing Section Technical Report No. 399.

Martin's second paper outlines a modification of his original system to include a front-end employing correlation-based analysis rather than sinusoidal (STFT) analysis. A goal of this project is to surmount some of the difficulties of the original system, particularly with respect to octave identification errors, without limiting transcription to a known set of instruments. While Martin indicates that the system shows promise, its performance reveals remaining shortcomings.

Available online http://xenia.media.mit.edu/~kdm/research/papers/kdm-TR399.pdf (Accessed 16 February 2005)

Monti, G., and M. Sandler. 2002. Automatic polyphonic piano note extraction using fuzzy logic in a blackboard system. Proceedings of the 5th international conference on digital audio effects, 39–44.

This paper presents an evolution of the system from Monti, Sandler, and Bello's 2000 papers. The neural network chord identifier has been done away with, the data hierarchy has been slightly modified, a front end has been developed to incorporate knowledge of psychoacoustic properties, and a Fuzzy Inference System knowledge source has been included to suggest new candidate notes. A quantitative analysis reveals a success of 45% according to the Dixon formula (explained in the paper), with 74% accuracy on transcribed notes. (It is not demonstrated that this system is any better than the 2000 system.) Octave errors still appear to be a problem, but the simple skipping over of notes and not including them in the transcription at all is a much bigger problem. This is overall a readable paper, and the inclusion of a fuzzy inference system seems interesting, but it does not provide a very detailed discussion of the analysis results.

Available online http://www.unibw-hamburg.de/EWEB/ANT/dafx2002/papers/
DAFX02_Monti_Sandler_polyphonic_piano_extraction.pdf (Accessed 16 February 2005)

Nii, H. 1986. The blackboard model of problem solving and the evolution of blackboard architectures. AI Magazine 7 (2):38–53.

Nii's paper doesn't address music at all, and it doesn't "introduce" the blackboard model as a new technology, per se. However, it does outline systems already in existence by 1986 and put them under an umbrella of "blackboard systems," which is defined in the paper. This seems to be a good starting point for anyone wanting more depth regarding blackboard systems, how they work, and variations among them, beyond what is supplied in the background sections of the papers specifically on musical systems.

Available online http://www.aaai.org/Library/Magazine/Vol07/07-02/vol07-02.html (Accessed 16 February 2005)