Classifier Ensembles: A Practical Overview by Cory McKay

8.0 Classifier diversity

The diversity of an ensemble’s component classifiers is one of the most important factors influencing its success. It is obvious that an ensemble of identical classifiers will offer no advantage over any one of the component classifiers. Similarly, an ensemble of classifiers where each component classifier typically makes errors on the same input patterns is not advantageous either. Ensembles work best when the component classifiers each tend to make misclassifications on different types of input patterns. As long as there are a sufficient number of component classifiers and each component has a reasonably high success rate, such classifier diversity can potentially enable ensembles as a whole to perform better than any of their individual components by averaging out misclassifications. Diversity is important enough that it can be worth weakening individual classifiers if the result is an increase in diversity, as demonstrated by the AdaBoost algorithm.

An examination of the successful bagging and boosting methods described in Section 7 reveals that their goal is essentially to ensure classifier diversity. There are a number of general approaches that one can take in order to help increase diversity:

Use classifiers that have significantly different learning algorithms that make different assumptions about the universe of possible input patterns. For example, combinations of parametric and non-parametric classifiers may complement each other well.
Vary the parameters of the algorithms. For example, neural networks with different numbers of hidden units could be used, as could k-nn classifiers with different values for k.
If only one or a few classification algorithms are available or appropriate, then non-deterministic and instable algorithms are likely to be the most suitable for use in ensembles.
Vary the features and/or training samples that each classifier is exposed to. This is only appropriate if there are a sufficient number of features and training samples available.

Unfortunately, there is no single measure of diversity that can be universally applied with respect to classifier ensembles. Although statistical tools such as correlation, the Q statistic and interrater agreement can be used in some cases, they are not universally applicable to the many possible complex ensemble architectures and approaches. Furthermore, particular measurements of diversity do not necessarily predictably indicate corresponding variations in ensemble performance, which is ultimately what matters. In general, one must rely on common sense, intuition, experience and, perhaps most importantly, knowledge of each specific application domain and an understanding of the classification techniques being used.

Of course, this does not mean that one should not continue to study notions such as diversity, or other statistical ways of potentially predicting the performance of ensembles. Such research could certainly lead to future theoretical results that could be of great value, which would be very beneficial in simplifying the currently somewhat ambiguous field of applied classifier ensembles.

An applet has been implemented that demonstrates the role of diversity in classifier ensembles. This applet can be found in Section 11.

Next: Concluding remarks

Last modified: April 18, 2005.
-top of page-