of The American Society for Information Science

Vol. 26, No. 5

June/July 2000

Go to
 Bulletin Index

bookstore2Go to the ASIS Bookstore

  Copies

Access to Music Information: The State of the Art

by J. Stephen Downie

This article is a synopsis of my contribution to The Sound of Information: Auditory Browsing and Audio Information Retrieval (SIG/VIS), a panel moderated by Abby Goodrum. The presentation date, November 1, 1999, is significant, for on that date I had been in possession of my recently conferred Ph.D. for exactly 9 days. At the panel session, I fell prey to the classic problem that afflicts most recent doctoral graduates: I went on, ad infinitum, about my thesis and never did get around to addressing any of the larger issues. This being the case, I welcome this opportunity to share with you both an outline of my thesis work and an overview of the community-building efforts that are currently being undertaken by myself and others interested in the burgeoning field of Music Information Retrieval (MIR) research and development. My thesis, entitled Evaluating a Simple Approach to Music Information Retrieval: Conceiving Melodic N-grams as Text, will soon be available for downloading at http://mir.lis.uiuc.edu/thesis.

Background

Before the advent of digital computers, the principal method of accessing music information was the thematic catalogue. Scholars and musicians have consulted these printed volumes for over a thousand years. In them, they have found fragments of musical works called incipits that represent the beginnings of a work or significant parts (i.e., themes). According to Barry S. Brook, these incipits have taken on various forms including "conventional notes, neumes, tablatures, numbers, letters or computer codes." The amount of information conveyed by incipits can vary depending on their representation. Sometimes incipits have been verbatim extracts from a musical score and contain pitch, harmonic, rhythmic, editorial, textual and timbral (i.e., sound "colour") information. Other times, the authors of thematic catalogues have seen fit to greatly reduce the amount of information presented by representing only select aspects of a melody, usually pitch names (e.g ., Barlow and Morgenstern's Dictionary of Musical Themes). Information has been further reduced by representing incipits, not through the use of the pitches, but by intervals (i.e., the distance between notes). Denys Parsons' Directory of Tunes and Musical Themes is the best exemplar of this interval-only approach. Notwithstanding the method of representation used, the one thing that makes all thematic catalogues special is that they attempt to give users the ability to access music information on its own terms; that is, thematic catalogues provide the ability to answer music queries which have been framed musically.

Automating access to music information using digital computers has intrigued musicologists, computer scientists, librarians and music lovers alike. Each has had his/her own purpose in mind and thus there seem to be as many approaches to developing Music Information Retrieval (MIR) systems as there are users. A half-hour perusal of the back issues of Computing in Musicology

http://musedata.stanford.edu/publications/cm/index.html

will bring this fact to the fore. Some have designed complex suites of computer tools to analyze all the varied facets of music information. One such suite is David Huron's, Humdrum

http://dactyl.som.ohio-state.edu/Humdrum/

Others have tried to automate the thematic catalogue by including incipit or thematic extracts as part of a bibliographic record. The Répertoire International Des Sources Musicales database is a case in point

www.rism.harvard.edu/rism/

Another interesting example of the thematic approach is Huron and Kornstaedt's Themefinder

http://musedata.stanford.edu/databases/themefinder/

Still others have explored the idea of using sophisticated approximate string matching techniques. The New Zealand Digital Library's Meldex

www.nzdl.org/cgi-bin/gw?c=meldex&a=page&p=coltitle

is a good example. Prechelt and Typke's Tuneserver

www.ipd.ira.uka.de/tuneserver/

based upon the work of Denys Parsons, also utilizes approximate string matching techniques. Despite the variety of approaches taken, all extant MIR systems are united by the fact that each has some kind of significant shortcoming. The more powerful analytic systems can be very difficult to use, incipit and thematic indexes both leave out large amounts of music that might be of interest, and approximate string matches can be computationally expensive without necessarily giving better results.

Summarizing My Thesis Research

Taking my cue from those printed thematic catalogues that have reduced the amount of music information represented (e.g., Keller and Rabson's National Tune Index), I developed, and then evaluated, an MIR system based upon the intervals found within the melodies of a collection of 9354 folk songs. I postulated that there is enough information contained within an interval-only representation of monophonic melodies that effective retrieval of music information could be achieved. I extended the thematic catalogue model by affording access to musical expressions found anywhere within a melody (i.e., a "full-text" approach). To achieve this extension, I fragmented the melodies into length-n subsections called n-grams. The length of these n-grams, and the degree to which I precisely represented the intervals, were variables analyzed in the thesis.

N-grams form discrete units of melodic information much in the same manner as words are discrete units of language. Thus, I came to consider them musical words. This implied that, for the purposes of music information retrieval, I could treat them as "real words" and thereby apply traditional text-based information retrieval techniques. I examined the validity of my musical word concept from two interrelated viewpoints. First, a variety of informetric analyses were conducted to examine in which ways the information properties of musical words and "real words" are similar or different. Second, I constructed a collection of musical word databases using Salton's famous text-based, SMART information retrieval system with tf * idf used as the "term" weighting metric. Melodic strings were extracted from the songs in the database (30 Incipit, 30 Random). These strings were used as the basis of the queries and were subjected to the various treatments. The five treatments were designed to answer the following questions:

  • Does the length of the musical word n-gram make a significant difference?
  • Does the length of the query make a significant difference?
  • Does it really matter how precisely one represents a melody's rise and fall?
  • Does the introduction of a simulated query error gum up the works?
  • Does it matter significantly whether a query represents an incipit or some other part of the melody?

The experimental model was complex, and the performance results were evaluated using the traditional text-based normalized precision and normalized recall measures.

The Bulletin is not the appropriate forum for a full discussion of the findings of the experiments and evaluations, but please allow me to present my "cocktail party" summarization of the results. On the informetric front, I discovered a set of compelling similarities between my musical word databases and some of the standard IR test collections. On the retrieval front, the normalized precision scores were surprisingly good. For instance, precision scores in the range 97-99% were achieved for some of the test databases. Even when a simulated error was present, some test configurations returned a respectable 82% precision. Taken together, these findings strongly suggest that traditional text retrieval approaches should be explored more thoroughly as a possible means of accessing music information. This implies that already existing text retrieval systems, such as library OPACs and the WWW search engines, might someday also become our MIR systems of the future.

Moving Toward an MIR Research Community

During the course of my dissertation work, I was struck by the paucity of literature concerning MIR issues. The recently published Melodic Comparison: Concepts, Procedures, and Applications

http://musedata.stanford.edu/publications/cm/idx11.html

 notwithstanding, the corpus of formal literature on MIR system development and evaluation was dismayingly sparse and widely scattered. It appeared to me that the various MIR research teams were operating in isolated autonomy and thus the literature was merely reflecting this state of affairs. Feeling isolated myself, I yearned for the creation of a true MIR research community - one similar to the text IR community - where ideas and techniques could be shared in a spirit of common benefit. To alleviate my sense of isolation and to foster a framework for MIR research cooperation, I proposed, and had accepted, one, if not the, first full-day workshops ever held in North America exclusively devoted to MIR issues. The Exploratory Workshop on Music Information Retrieval was held as part of the ACM SIGIR '99 Annual Conference, at Berkeley, California, August 19, 1999.

MIR researchers from such diverse locations as France, Canada, New Zealand, United States, England, Norway and Italy convened at the workshop to inform one another about their widely varied approaches to the MIR problem. A heartening aspect of the MIR workshop was the multidisciplinary backgrounds of the participants: library science, musicology, industry, computer science and engineering. A "mini-proceedings" of presentation abstracts was compiled. This slim, but informative, volume contains contact information and an overview of the sundry methods being employed by the workshop's participants

www.lis.uiuc.edu/~jdownie/mir_papers/sigir99_wshop_proc.pdf

I had hoped that the Exploratory Workshop would contribute to the formation of a more cohesive MIR research and evaluation program by affording all interested parties the chance to share their insights and achievements. I had also dreamt that, perhaps, one day, MIR researchers and stakeholders would look back upon this workshop as the birthplace of a new ACM or ASIS SIG, one replete with peer-reviewed publications, annual meetings and TREC-like competitions. Fortunately, it appears as though my dreams might yet come true!

The connections established at the SIGIR '99 workshop led to my acquaintance with Don Byrd, a research scientist working with Bruce Croft at the University of Massachusetts' Center for Intelligent Information Retrieval. Together with Matthew Dovey and Tim Crawford, both of Kings College, London, Don and Bruce were recently awarded significant NSF/JISC funding for their OMRAS (Online Music Retrieval and Search) project. Over lunch, Don and I decided that we should join forces to move the creation of a coherent MIR research program forward. Upon his return to Massachusetts, Don set out to petition NSF for special conference funding. His efforts were rewarded and Don secured funds to mount the upcoming International Symposium on Music Information Retrieval, which will be held October 23-25, 2000, in Plymouth, MA. Our deepest desire is that the Symposium will be able to bring together all the principal MIR stakeholders from around the world currently investigating the gamut of MIR issues. Marvin Minsky, of MIT, has agreed to be the symposium's keynote speaker.

The symposium will feature a mix of invited and submitted research papers and demonstrations. The areas of MIR research include, but are not limited to, the following:

  • Estimating similarity of melodies and polyphonic music
  • Music representation and indexing
  • Problems of recognizing music optically and/or via audio
  • Routing and filtering for music
  • Building up music databases
  • Evaluation of music-IR systems
  • Intellectual property rights issues
  • User interfaces for music IR
  • Issues related to musical styles and genres
  • Language modeling for music
  • User needs and expections

We also are looking into publishing the proceedings of the symposium as a monograph. Altogether, the symposium should play a significant role in the realization of a vibrant and mutually supportive MIR research community. As program chair, I invite all interested parties to examine the symposium's "Call for Participation"

http://ciir.cs.umass.edu/music2000/

Concluding Comments: A Prognostication and a Lighthearted Plea

As Yogi Berra is claimed to have said, "Making predictions is difficult, especially about the future." However, I feel safe in predicting with absolute, unreserved and wholehearted certainty that we will see interest in MIR research and development issues increase exponentially over the next few years. As you may know, the explosive growth in the creation, use and trading of MP3 audio files has been nothing short of astounding. This phenomenon has led to the situation where, according to many Internet search services, the search for MP3 files has supplanted the request for sex-related websites as the most frequently occurring query. Currently, MP3 files are only accessible via simple text-based metadata representations of artist, title and lyric contents. I believe that the prospect of outrageously huge dot-com-style profits will soon act as a clarion call to academic and industry researchers alike to undertake MIR research and development projects with the greatest of haste. So, if you, Dear Reader, have access to research funding, and you have an interest in MIR issues, I would appreciate hearing from you - the sooner, the better!

Even if you find yourself currently lacking in financial resources, but find MIR issues intriguing, please feel free to contact me at jdownie@uiuc.edu. After all, being filthy rich is not the only reason for exploring the fascinating world of MIR research and development issues.

Acknowledgments

The author began his MIR research in 1993 under the supervision of the late Dr. Jean Tague-Sutcliffe, to whom he now dedicates this article. Dr. Downie's Ph.D. work was completed under the supervision of Dr. Michael Nelson, to whom he wishes express his gratitude. Dr. Bernd Frohmann must also be acknowledged here as being instrumental in bringing together the support necessary to enable the successful completion and defense of the dissertation.

J. Stephen Downie holds a BA (Music), an MLIS and a Ph.D., all earned at the University of Western Ontario, London, Ontario, Canada. Dr. Downie is currently an assistant professor at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign.


ASIS Home Search ASISSend us a Comment

How to Order

@ 2000, American Society for Information Science