Classifier Ensembles: A Practical Overview by Cory McKay

9.0 Concluding remarks

A number of theoretical arguments have been proposed claiming to demonstrate the general superiority of ensemble classification over single classifier classification. Many of these arguments are still debated in the pattern recognition community, however, with the possible exception of Kleinberg’s stochastic discrimination theory (2000), which appears to be the only generally non-contentious model of ensemble classification with anything approaching some generality. In fact, much of the literature is very much split, with some extremists claiming that classifier ensembles are a complete waste of time, and other extremists effectively suggesting that ensembles are always better than single classifiers. Those interested in a more theoretical analysis than what is presented here may wish to consult Kuncheva’s book (2004) as a starting point, particularly Chapter 10.

The experimental evidence available is also somewhat divided. Many experiments show no significant performance benefit to using ensembles compared to using the single best classifier. Others show significant increases in performance when ensembles are used.

In either case, one must retain some level of scepticism when reading quantitative comparisons of any competing classification schemes. A method that works very well for one particular experiment may not necessarily work for another. There can also often be some somewhat questionable practices involved in published comparisons between one’s own system and other systems, as one can sometimes be predisposed to being more vigorous in the tweaking and optimization of one’s own system. George Nagy uses satire to describe this tendency with great eloquence:

Comparison of Classification Accuracies
Comparisons against algorithms proposed by others are distasteful and should be avoided. When this is not possible, the following Theorem of Ethical Data Selection may prove useful. Theorem: There exists a set of data for which a candidate algorithm is superior to any given rival algorithm. This set may be constructed by omitting from the test set any pattern which is misclassified by the candidate algorithm.

Replication of Experiments
Since pattern recognition is a mature discipline, the replication of experiments on new data by independent research groups, a fetish in the physical and biological sciences, is unnecessary. Concentrate instead on the accumulation of novel, universally applicable algorithms. Casey’s Caution: Do not ever make your experimental data available to others; someone may find an obvious solution that you missed.

(Nagy 1993)

In general, it seems that classifier ensembles can work very well in certain cases, but not in others. The arguments presented in Section 2 do make a convincing case for using ensembles, as does the proven success of algorithms such as AdaBoost. It is therefore not surprising that many influential researchers, such as Josef Kittler (2000), continue to emphasize the value of classifier ensembles. There are certainly cases where a single classifier is a better choice than an ensemble, however, and there are many cases where ensembles designed without a proper understanding of the application domain and the field of classifier combination will perform poorly. So, although classifier ensembles can certainly prove to be very powerful, one must be cautious about introducing the added level of complexity that they bring unless one has a clear understanding of why and how they should do so.

Next: Bibliography

Last modified: April 18, 2005.
-top of page-