The diversity of an ensemble’s component classifiers is one of the most important factors influencing its success. It is obvious that an ensemble of identical classifiers will offer no advantage over any one of the component classifiers. Similarly, an ensemble of classifiers where each component classifier typically makes errors on the same input patterns is not advantageous either. Ensembles work best when the component classifiers each tend to make misclassifications on different types of input patterns. As long as there are a sufficient number of component classifiers and each component has a reasonably high success rate, such classifier diversity can potentially enable ensembles as a whole to perform better than any of their individual components by averaging out misclassifications. Diversity is important enough that it can be worth weakening individual classifiers if the result is an increase in diversity, as demonstrated by the AdaBoost algorithm.
An examination of the successful bagging and boosting methods described in Section 7 reveals that their goal is essentially to ensure classifier diversity. There are a number of general approaches that one can take in order to help increase diversity:
Unfortunately, there is no single measure of diversity that can be universally applied with respect to classifier ensembles. Although statistical tools such as correlation, the Q statistic and interrater agreement can be used in some cases, they are not universally applicable to the many possible complex ensemble architectures and approaches. Furthermore, particular measurements of diversity do not necessarily predictably indicate corresponding variations in ensemble performance, which is ultimately what matters. In general, one must rely on common sense, intuition, experience and, perhaps most importantly, knowledge of each specific application domain and an understanding of the classification techniques being used.
Of course, this does not mean that one should not continue to study notions such as diversity, or other statistical ways of potentially predicting the performance of ensembles. Such research could certainly lead to future theoretical results that could be of great value, which would be very beneficial in simplifying the currently somewhat ambiguous field of applied classifier ensembles.
An applet has been implemented that demonstrates the role of diversity in classifier ensembles. This applet can be found in Section 11.
Next: Concluding remarks
Last modified: April 18, 2005.
-top of page-