As discussed in Section 4.0, classifier fusion involves all classifiers in an ensemble in each classification decision. This section provides an overview of how the outputs of each component classifier can be fused into a single output for the ensemble. Although the output of individual classifiers as well as ensembles as a whole can take a variety of forms (e.g. a class label, a ranked list of possible class labels or a certainty score for each class), the core philosophy of most the techniques discussed below can be applied to any of them. Methods that involve output consisting of a single class label are emphasized here, as they represent the general case.
This approach operates much the same way as human voting supposedly does in political elections, which is to say that each classifier has one equal vote. In essence, the most popular classification is the one chosen by the ensemble as a whole. The most basic variant is the plurality vote, where the class with the most votes win. There are several variations of the plurality vote. For example, if a component classifier can output a certainty score for each category, then this classifier may be permitted to divide its vote up between a number of classes, each with its own preference rating.
Other voting variants require a minimum number of votes in favour of one option before arriving at a final decision (e.g. simple majority (50% + 1), 2/3, etc.). If the votes of the classifiers are split between too many classes for the required majority to be reached, then a run-off vote can be performed, where the least popular class or classes are dropped as candidates, and the classifiers perform a new classification among the reduced candidate classes. This approach can work well when there are many candidate classes, since a simple plurality vote (i.e. the class with the most votes wins) can be of questionable value if a particular class only receives 10% of the votes, for example, but still wins because the remaining votes are spread out even more finely among the many alternative classes.
Although the theoretical upper and lower performance bounds of the plurality vote are well understood (Matan 1996), other types of voting remain to be more formally analyzed.
This approach is similar to the way that shareholders in corporations vote, which is to say that the each voter can have a different amount of influence on the final outcome. One approach to assigning weights is to train each classifier individually, and have each classifier evaluate a test set individually. The weighting assigned to the vote of each classifier is then based on its performance on the test set. Assuming that the classifiers are independent and that the test set results are an accurate representation of the effectiveness of each classifier, neither of which are always the case, then it is probable that this approach will work better than simple majority voting.
There are a variety of other more complicated and less popular fusion methods that can be used. There are too many to describe in detail here, but some of the most common ones are:
Given the number of publications on different classifier fusion methods, it seems apparent that it can be tempting to neglect Ho’s advice (2002), presented in Section 2.0, by performing concentrated research on a wide variety of sophisticated fusion methods. In practice, however, the naïve application of such techniques can result in little or no improvement over simpler approaches.
This is illustrated in Figure 5.1, taken from an experiment performed by Kuncheva (2004). She trained nine single hidden layer backprop neural networks on the UCI Pima Indian Diabetes data set using ten-fold cross-validation. The performance of the single best network was compared to the performance of the ensemble as a whole using a variety of fusion methods.
Figure 5.1: Results from an experiment with the UCI Pima Indian Diabetes database comparing the effectiveness of different classifier fusion methods. The closed circles represent mean testing set classification success rates, and the open circles represent mean training set success rates. (Kuncheva 2004, p. 145).
As can be seen in Figure 5.1, there was very little difference between any of the coordination techniques. Indeed, they did not even show any statistically significant improvement over the single best classifier when evaluated with the testing data. Kuncheva has provided this excellent example in order to show that an excessive emphasis on developing and using ever more sophisticated fusion techniques has little if any benefit if one does not intelligently consider the specificities of a particular problem and choose a good solution based on this understanding and on a fundamental understanding of how classifiers and ensembles work. In this particular example, Kuncheva suggests that the lack of significant performance improvement of the ensemble compared to the single best classifier was simply due to an insufficient diversity in the component classifiers (see Section 8 for more on diversity).
Next: Classifier selection techniques
Last modified: April 18, 2005.
-top of page-