Classifier Ensembles: A Practical Overview by Cory McKay

4.0 Overview of classifier combination techniques

Most classifier combination, or aggregation, methods fall into one of two general types:

Classifier fusion: Each classifier in an ensemble has some knowledge of the entire feature space. In most cases, each classifier makes its own decision, and the results produced by each of the classifiers are combined using procedures such as averaging or majority voting.
Classifier selection: Each classifier in an ensemble has knowledge of only some subset of the feature and/or training space. This means that each classifier is in a sense an expert on a different part of a classification problem. Although multiple classifiers can still be involved in the final classification, classifier selection techniques in general tend to use some method to dynamically select a single classifier for each particular input pattern to make the classification for the ensemble as a whole.

It is also not unusual to find hybrids of these two approaches. Weighted fusion, for example, is one common hybrid method that involves all classifiers in a decision, as is typically done in classifier fusion, but also dynamically assigns a weighting to each classifier based on each input pattern, an approach that is inspired by classifier selection.

Some classifier combination methodologies require training in addition to the training of the component classifiers, and some do not. The majority vote combiner, for example, needs no additional training, but the weighted average combiner does (see Section 5). Other approaches, such as AdaBoost (see Section 7.2), develop the combiner simultaneously with the training of the individual classifiers.

Trainable classifiers can be split into two groups: “implicitly data-dependant” models and ”explicitly data-dependent” models. In the former case, fusion parameters are trained in a way that does not vary based on particular input patterns. Explicitly data-dependant models, in contrast, use coordination parameters that vary based on each individual input pattern.

The choice of combination approach is related to the amount of training data available. The availability of only a small training set can pose problems if one is using an approach that requires training of each component classifier on different training sets or if a trainable combiner that requires its own distinct training set is being used. Dietrich et al. (2003) suggest that using overlapping training sets for the component classifiers and the combiner is justified when only a small training set is available. Duin (2002), in contrast, suggests reserving a separate training set for the combiner and then overtraining the component classifiers on the remaining training set. Although the component classifiers may then exhibit specialization, the trained combiner may be able to learn how to compensate for this in the overall classifications of the ensemble. Duin also suggests the possibility of using a single training set for both the component classifiers and combiner, but undertraining the component classifiers in order to reserve a certain amount of unused training potential for the combiner.

As a side note, an observant reader may notice that there is a strong parallel between classifier combination, particularly in the case of architectures such as mixtures of experts (see Section 6), and blackboard systems. Blackboard systems consist of a variety of decision makers, termed “knowledge sources,” that are each experts in some aspect of a problem. These knowledge sources use input data to form hypotheses on a solution to the problem, which are written to a shared data space known as the “blackboard.” The knowledge sources then modify their own hypotheses based on the hypotheses of the other knowledge sources, using one of a great many possible techniques, with the result that they gradually converge to a solution. This is analogous to a group of human experts discussing a problem with each other until they collectively come to agree upon a solution.

Surprisingly, the author of this report did not find any mention of blackboard systems in the literature on classifier ensembles. Although there is no claim here to anything approaching a complete and comprehensive survey of the literature, this at least implies that blackboard systems are not a common topic of interest to the classifier ensemble community. This is unfortunate, as there is a relatively long history of published research on blackboard systems, and it seems obvious to the author that there is a significant amount of overlap between these two fields.

One possible reason for the neglect of blackboard systems in the pattern recognition community is that they are often associated with expert systems that do not involve machine learning. There is no convincing reason that at least some of the techniques developed for blackboard systems could not be adapted to learning-based algorithms, however. Blackboard systems are an extensive topic in and of themselves, and are therefore beyond the scope of this project, but those interested in learning more about them may wish to consult the book edited by Jagannathan et al. (1989) as a starting point.

Next: Classifier fusion techniques

Last modified: April 18, 2005.
-top of page-