Annotated Bibliography
On the classification side, Fisher Linear Discriminant (related to LDA) is used. The performance of FLD is compared to KNN and the result is promising. For FLD, the best results are obtained using MFCC and Voice2White features, with error probability of 4.09% and 4.91% being reported. It also shows that very good result can be obtained by only using one feature, which is useful to reduce the computational cost for real-time application.
Feature sets based on Timbral Texture (can be used in real time), including spectral centroid, spectral rolloff, spectral flux, time domain zero crossings, Mel-frequency cepstral coefficients (MFCC), analysis and texture window, low-energy feature, resulting in a 19-dimensional feature vector.
Feature sets based on Rhythmic Content (beat histogram), including full-wave rectification, low-pass filtering, down sampling, mean removal, enhanced autocorrelation, peak detection and histogram calculation, beat histogram features. Features sets based on Pitch Content (pitch histogram) is also useful features for automatic music genre classification. Classification:
Standard statistical pattern recognition methods, such as Gaussian Classifier, GMM, K-NN classifier, were used to evaluate the feature sets.
Results: Classification of 61% (non-realtime) and 44% (realtime) were reported.
1, Feature extraction: Modified Low Energy Ratio (MLER), which introduce a coefficient to the low-energy feature evaluation.
2, Classifier: 1-D Bayes MAP classifier
3, Classification Refine: context based "Post-decision Method", by exploiting the relevance of neighboring clips
On-line Resources
Many other good tutorials about machine learning.