Blackboard Algorithms for Polyphonic Music
Transcription
Annotated Bibliography with Hyperlinks
Rebecca Fiebrink
Created 13 February 2005
Bello, J. 2003. Towards the automated analysis of simple polyphonic music: A
knowledge-based approach. PhD dissertation, University of London.
Bello's thesis builds on previous systems by Bello et al., Martin,
Kashino, and others. The system here employs a blackboard architecture,
but it does not include the neural network chord recognizer or the fuzzy
inference system seen in earlier papers. Like Martin's system, it
employs global scheduling; like Kashino's, it uses temporal processing
in addition to bottom-up and top-down. Information regarding single
instrument waveforms is used as an alternative to strict frequency
domain analysis.
This work, being a thesis, is not recommended in its entirety for casual reading. However, it seems to be well enough organized so that one can find specific information on the implementation. As a bonus, Section 2.2 provides a great overview of existing blackboard systems, and it might suggest additional reading useful to blackboard newcomers. Available online http://www.elec.qmul.ac.uk/research/thesis/Bello2003-thesis.pdf (Accessed 16 February 2005) |
Bello, J., G. Monti, and M. Sandler. 2000. Techniques for automatic music transcription. Proceedings of the international symposium on music information retrieval.
This is Bello, Monti, and
Sandler's first paper to present a blackboard polyphonic transcription
system. (It is described in the second half of the paper.) Their system
is based on Martin's 1996 work. A significant variation from Martin is
the introduction of a neural network-based chord recognizer. This new
knowledge source identifies the probable existence of a chord in the
input (as opposed to a single note). The data hierarchy is simplified
from Martin's work. The system transcribes piano music with three or
fewer consecutive notes, and it, like Martin's project, encounters
problems identifying octave intervals. There is no quantitative analysis
of the system performance. This is a reasonably straightforward and
readable work.
Available online http://ciir.cs.umass.edu/music2000/papers/bello_paper.pdf (Accessed 16 February 2005) |
Bello, J., and M. Sandler. 2000. Blackboard system and top-down processing for the transcription of simple polyphonic music. Proceedings of the COST G-6 conference on digital audio effects.
This paper describes
the same system as Bello, Monti, and Sandler, 2000, and most of the
content is the same as the first. However, new diagrams are present, and
the "Conclusions and Next Steps" section has been better
fleshed-out.
Available online http://www.elec.qmul.ac.uk/staffinfo/juan/documents/BelloDAFx00.pdf (Accessed 16 February 2005) |
Ellis, D. 1996. Prediction-driven
computational auditory scene analysis. PhD dissertation, Massachusetts Institute
of Technology.
This thesis is the
culmination of Ellis' work in auditory scene analysis at MIT. Previous
systems by Ellis employed perceptual knowledge in computational auditory
scene analysis. This system explicitly incorporates a blackboard
architecture. Its goal is to distinguish among ambient sounds, rather
than to transcribe polyphonic music, but I have included it here because
Ellis' work influenced other researchers such as Martin and Bello.
Available online http://web.media.mit.edu/~dpwe/pdcasa/pdcasa.pdf (Accessed 16 February 2005) |
Godsmark, D., and G. Brown. 1999.
A blackboard architecture for computational auditory scene analysis. Speech
Communication 27:351–366.
Using
perceptually-informed preprocessing, Godsmark and Brown's blackboard
system performs well on polyphonic, multi-timbral transcription tasks. A
focus is on hierarchical grouping principles in perceptual sound
organization. While the system is evaluated with musical signals, it
approaches the problem from the perspective of auditory scene analysis
in general.
This is a long and dense paper, and I recommend reading it after Martin's and Kashino's. Available online http://www.dcs.shef.ac.uk/~guy/pdf/spcom99.pdf (Accessed 16 February 2005) |
Hainsworth, S. 2001. Analysis of musical audio for polyphonic transcription. First year Ph.D. report, University of Cambridge.
This handy report
briefly outlines the history of polyphonic transcription up to 2001. (It
does not present original work.) It makes a useful reference, and might
suggest further reading on any aspect of transcription. I especially
like that there are often a few words regarding how one researcher's
work relates to others'.
Available online http://www-sigproc.eng.cam.ac.uk/~swh21/ (Accessed 16 February 2005) |
Kashino, K., K. Nakadai, T. Kinoshita, and H. Tanaka. 1995a. Application of Bayesian probability network to music scene analysis. Proceedings of the computational auditory scene analysis workshop, international joint conferences on artificial intelligence, 21–26.
This paper describes the same OPTIMA system as Kashino et al. 1995b. Much of this paper is a word for word duplication of the other, so for the purposes of describing system implementation it serves as a complementary resource to someone wishing to understand the system details. Importantly, this paper supplies a broader range of evaluation results, including octave and fifth relations in the input. It is apparent that note recognition is significantly hampered by the presence of these intervals, a result not clear from the 1995b paper. |
———. 1995b. Organization of hierarchical perceptual sounds: Music scene analysis with autonomous processing modules and a quantitative information integration mechanism. Proceedings of the computational auditory scene analysis workshop, international joint conferences on artificial intelligence, 158–64.
This paper does not explicitly
involve blackboard algorithms or polyphonic transcription. However, the
goal is the construction of a music scene analysis system that involves
the recognition of rhythm, chords, and source-separated musical notes.
In effect, the output is a transcription of multiple-instrument,
multiple-voice music. The computational system involved (called
"OPTIMA" by the authors) resembles Martin's blackboard system
in that it employs perceptually-informed top-down and bottom-up
knowledge sources. The method differs from Martin's blackboard system in
that, rather than using a global control method, it uses a Bayesian
probability network to propagate information about three perceptual
layers of hypotheses throughout the system. The system performed between
30-100% correctly depending on the evaluation method, but the evaluation
details are on the sparse side. It is notable that test inputs were
synthesized from MIDI data, not real acoustic performances.
The work of Kashino (see 1995a referenced on this page, as well) was done concurrently with that of Martin, and many similarities exist between the two systems, but the two groups seem to have worked independently. Kashino's papers explicitly tie the representation of sounds employed in the computational system to a hierarchical model of sound perception derived from Bregman. Kashino also addresses sound source (i.e. instrument) identification. The details of this paper are difficult to understand for someone not familiar with Bayesian networks. However, it's possible to gain a basic appreciation for Kashino's approach, and a better idea of how blackboards and related systems work. |
Klapuri demonstrates
the flexibility of the blackboard architecture in incorporating existing
audio content and analysis algorithms. Polyphonic transcription is
outlined as a system goal, but details on system performance are not
present. This discussion-oriented paper is most useful in highlighting
the advantages of the blackboard architecture in general applications.
Available online http://www.cs.tut.fi/sgn/arg/music/aes2001_klapuri.pdf (Accessed 16 February 2005) |
Martin, K. 1996a. A blackboard system for automatic transcription of simple polyphonic music. MIT Media Laboratory Perceptual Computing Section Technical Report No. 385.
This paper offers an
indispensable introduction to how a blackboard system might function in
polyphonic music transcription. Martin's project seems to be the first
working blackboard polyphonic transcription system. The project has a
limited scope (piano renditions of Bach chorales) and outstanding
problems (notably octave errors), and it relies on an artificial (not
perceptually-informed) hierarchy of sound objects. However, the
explanation of the system implementation is quite clear, and a lengthy
example transcription is very helpful to someone wishing to understand
how blackboard systems work. No quantitative evaluation of system
performance is offered, but shortcomings are discussed.
Available online http://xenia.media.mit.edu/~kdm/research/papers/kdm-TR385.pdf (Accessed 16 February 2005) |
———. 1996b. Automatic transcription of simple polyphonic music: Robust front end processing. MIT Media Laboratory Perceptual Computing Section Technical Report No. 399.
Martin's second paper outlines a
modification of his original system to include a front-end employing
correlation-based analysis rather than sinusoidal (STFT) analysis. A
goal of this project is to surmount some of the difficulties of the
original system, particularly with respect to octave identification
errors, without limiting transcription to a known set of instruments.
While Martin indicates that the system shows promise, its performance
reveals remaining shortcomings.
Available online http://xenia.media.mit.edu/~kdm/research/papers/kdm-TR399.pdf (Accessed 16 February 2005) |
Monti, G., and M. Sandler. 2002. Automatic polyphonic piano note extraction using fuzzy logic in a blackboard system. Proceedings of the 5th international conference on digital audio effects, 39–44.
This paper presents an evolution
of the system from Monti, Sandler, and Bello's 2000 papers. The neural
network chord identifier has been done away with, the data hierarchy has
been slightly modified, a front end has been developed to incorporate
knowledge of psychoacoustic properties, and a Fuzzy Inference System
knowledge source has been included to suggest new candidate notes. A
quantitative analysis reveals a success of 45% according to the Dixon
formula (explained in the paper), with 74% accuracy on transcribed
notes. (It is not demonstrated that this system is any better than the
2000 system.) Octave errors still appear to be a problem, but the simple
skipping over of notes and not including them in the transcription at
all is a much bigger problem. This is overall a readable paper, and the
inclusion of a fuzzy inference system seems interesting, but it does not
provide a very detailed discussion of the analysis results.
Available online http://www.unibw-hamburg.de/EWEB/ANT/dafx2002/papers/ |
Nii, H. 1986. The blackboard model of problem solving and the evolution of
blackboard architectures. AI Magazine 7 (2):38–53.
Nii's paper doesn't
address music at all, and it doesn't "introduce" the
blackboard model as a new technology, per se. However, it does outline
systems already in existence by 1986 and put them under an umbrella of
"blackboard systems," which is defined in the paper. This
seems to be a good starting point for anyone wanting more depth
regarding blackboard systems, how they work, and variations among them,
beyond what is supplied in the background sections of the papers
specifically on musical systems.
Available online http://www.aaai.org/Library/Magazine/Vol07/07-02/vol07-02.html (Accessed 16 February 2005) |
Go back to my main page or my MUMT 611 page...