Current content-based IR (information retrieval) systems usually use either full-text matching or keyword spotting. A major problem with these systems is overmatching; They are biased towards high recall while sacrificing precision. To improve both, matching mechanisms should exploit more information in the text, such as semantic class of keywords and sentence meaning. The current NLP (natural language processing) techniques, however, are not mature enough to accurately pick up such information. The GDA tags should be of a great help for improving NLP accuracy, which will then improve IR.

