Toward realtime recognition of acoustic musical instruments

Miller Puckett's fiddle~ object and an exemplar-based learning system are used to create a realtime timbre recognizer. The exemplars of timbral features of orchestral instrument sounds were created using the McGill Master Samples CD library.

In this experiment, the dynamically changing spectra are quantified by the average velocities of the centroid, the standard deviation, and the skewness. The mean and the standard deviation of the three spectral parameters, the integral of the spectrum, and the fundamental frequency were also calculated. These features were stored in the database for an exemplar-based learning system, which is based on a k-nearest neighbor classifier. The system is enhanced by a genetic algorithm, which finds the optimal set of feature weights to improve the recognition rate.

The FFT window size in the fiddle~ object was set to 1024 samples with a hop size of 512 samples. For each sound, 12800 samples or about 300 ms were analyzed. There is a trade-off between the total number of samples analyzed, thus the delay, and the recognition rate.

Although a considerable amount of time is needed for the genetic algorithm to determine the set of weights, the calculation time of the actual k-nearest neighbor classifier is insignificant and can be performed as soon as the required number of samples has been processed.

Compared to a previous experiment using only the steady-state portion of the sounds, the current system achieves 10-20% increase in the recognition rate. For example, the recognition rate of the trumpet, the clarinet, and the violin using the dynamic spectra is 92% compared to 80% using the steady-state spectrum.