Search engine technologies

google search string:

Witten "search engine" Inverted "Managing Gigabytes" "Natural Language" algorithms
Current content-based IR (information retrieval) systems usually use either full-text matching or keyword spotting. A major problem with these systems is overmatching; They are biased towards high recall while sacrificing precision. To improve both, matching mechanisms should exploit more information in the text, such as semantic class of keywords and sentence meaning. The current NLP (natural language processing) techniques, however, are not mature enough to accurately pick up such information. The GDA tags should be of a great help for improving NLP accuracy, which will then improve IR.

Search engines:

Themefinder (Stanford)
New Zealand Digital Music Library


Overview Books & Articles
The New Zealand Digital Library MELody inDEX (D-Lib 1997)
Online Music Recognition and Searching (OMRAS)
In Search of a Lost Melody
Access to Music Information: The State of the Art (Downie)

Bainbridge, D., Nevill-Manning, C.G., Witten, I.H., Smith, L.A., & McNab, R.J. (1999) "Towards a Digital
 Library of Popular Music" Proc. Digital Libraries 1999, Fox, E.A. and Rowe, N. (Eds.) 161--169.

Agosti, M., Bombi, F., Melucci, M., and Mian, G. (1999). Towards a digital library for the Venetian music of the eighteenth century. In Anderson, J., Deegan, M., Ross, S., and Harold, S., editors, Digital Content, Digital Methods. Office for Humanities Communication. In press.

W. B. Frakes and R. Baeza-Yates eds., Information Retrieval : Data Structures and Algorithms, Englewood
 Cliffs, N.J. : Prentice-Hall.

 T. Dao. An indexing model for structured documents to support queries on content, structure and attributes.
 In Proc. of the IEEE Forum on Research and Technology Advances in Digital Libraries, pages 88--97, California, USA, April 1998.