bodhidharma.classifiers
Class NN_BioKNN_Ensemble

java.lang.Object
  extended by bodhidharma.classifiers.NN_BioKNN_Ensemble

public class NN_BioKNN_Ensemble
extends java.lang.Object

An interface for using a combination of classifiers to result in a final classification. The ensemble consists of one BioKNearestNeighbour that is fed all FeatureOneDimensionals and one FeedForwardNeuralNetwork for each FeatureMultiDimensional. It is possible to configure NN_BioKNN_Ensemble objects to only use one of these two types of classifiers.

Classifier selection can be performed on the different classifiers using genetic algorighms. Weightings can also be evolved for each of the FeedForwardNeuralNetworks and the BioKNearestNeighbour. Alternatively, half of the weighting can be automatically assigned to the BioKNearestNeighbour classifier and the remaining weighting can be distributed equally among the FeedForwardNeuralNetwork.

This implementation makes it possible to assign more than one label to a single feature set.

Use the train method to train the classifier (the getFeatureNames method is useful for deteriming the features that can be fed to the classifer).

WARNING: due to the nature of the classifiers used, it is best to only train objects of this class once. In order to ensure this, the train method throws an exception if it is called and the classifier has already been trained.

Use the classify method to classify feature sets once training has been completed (the getCategories method is useful for determining what categories feature sets can be classified into and for determining the order of the categories when parsing classification results).

Use the save method to save the classifier and its current state to disk.

Use the getClassifierName and getClassifierParameters methods to obtain information about the classifier.

Use the getClassifierIdentifier method to get a name or code that was given to an instantiation of the classifier when it was constructed. This identifier can be used by external classes to identify the instantiation.

Use the getClassifierParameters method to get information about the settings stored in the instantiation of an object of this class.

Use the isTrained method to find out if the ensemble has been trained yet.

Use the getEnsembleSelection method to see the set of classifiers which survived feature selection. The names of the corresponding classifiers can be accessed with the getClassifierNames method.

See Also:
SupervisedClassifier, BioKNearestNeighbour, FeedForwardNeuralNetwork, FeatureOneDimensionals, FeatureMultiDimensional, FeatureSet, Recording
Author:
Cory McKay

Constructor Summary
NN_BioKNN_Ensemble(java.lang.String file_path, GeneticAlgorithmJFrame ga_settings, NeuralNetworkJFrame network_settings)
          Parse the file specified by the given file path to recreate the specificed trained classifier.
NN_BioKNN_Ensemble(java.lang.String identifier, java.lang.String[] possible_taxonomy_labels, FeatureSettings[] feature_settings, boolean use_selection_candidates, boolean ensemble_using_classifier_selection, boolean ensemble_using_classifier_weighting, double csw_training_fraction, boolean using_one_dimensional_features, boolean using_multi_dimensional_features, GeneticAlgorithmJFrame ga_settings, boolean bioknn_using_feature_selection, boolean bioknn_using_feature_weighting, int bioknn_iterations, double classifier_vs_feature_multiplier, double bioknn_acceptable_threshold, int bioknn_consecutive_iterations, NeuralNetworkJFrame network_settings, int network_iterations, double network_acceptable_threshold, int network_consecutive_iterations)
          Sets the basic fields of the NN_BioKNN_Ensemble.
 
Method Summary
 double[][] classify(Recording[] test_recordings, double[][][] individual_classifier_results)
          Returns the relative scores of each of the possible categories when the given recordings are classified.
 java.lang.String[] getCategories()
          Returns the categories into which this ensemble classifies test samples.
 java.lang.String getClassifierName()
          Returns the name of the type of classifier.
 java.lang.String[] getClassifierNames()
          Returns the names of the classifiers stored in this ensemble.
 java.lang.String getClassifierParameters()
          Returns a String describing the parameters of the classifier.
 boolean[] getEnsembleSelection()
          Returns the set of classifiers which survived feature selection (true).
 double[] getEnsembleWeightings()
          Returns the weights for the set of classifiers after feature weighting.
 double[] getMaxFeatureValueCutoffs()
          Returns the maximum allowable value for each of the one-dimensional features in the trained BioKNearestNeighbour.
 double[] getMinFeatureValueCutoffs()
          Returns the minimum allowable value for each of the one-dimensional features in the trained BioKNearestNeighbour.
 java.lang.String[] getNamesOfFeaturesUsedByOneDimensionalClassifier()
          Returns the names of the features used by the BioKNearestNeighbour classifier.
 int getNumberMultiDimFeatures()
          Returns the number of multi-dimensional features present.
 boolean[] getOneDimensionalFeatureSelection()
          Returns the features selected for use by the BioKNearestNeighbour classifier.
 double[] getOneDimensionalFeatureWeightings()
          Returns the features weightings used by the BioKNearestNeighbour classifier.
 double[][] getScaledOneDimensionalFeatureValues(Recording[] recordings)
          Returns an array that represents the values of the one-dimensional features in the recordings passed in the recordings parameter after having been scaled to fall between 0 and 1 based on previous training of the BioKNearestNeighbour.
 boolean isTrained()
          Returns whether or not the ensemble has been trained.
 void save(java.io.File place_to_save)
          Saves all of the fields except ga_settings and network_settings to the given File.
 double[][] train(Recording[] training_recordings, java.lang.String[][] model_classifications, ProgressBarTaskTrainMonitor training_monitor, ClassificationResultsInterpereter results_interpereter)
          Trains the ensemble of classifiers using the given recordings and their model classifications.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

NN_BioKNN_Ensemble

public NN_BioKNN_Ensemble(java.lang.String identifier,
                          java.lang.String[] possible_taxonomy_labels,
                          FeatureSettings[] feature_settings,
                          boolean use_selection_candidates,
                          boolean ensemble_using_classifier_selection,
                          boolean ensemble_using_classifier_weighting,
                          double csw_training_fraction,
                          boolean using_one_dimensional_features,
                          boolean using_multi_dimensional_features,
                          GeneticAlgorithmJFrame ga_settings,
                          boolean bioknn_using_feature_selection,
                          boolean bioknn_using_feature_weighting,
                          int bioknn_iterations,
                          double classifier_vs_feature_multiplier,
                          double bioknn_acceptable_threshold,
                          int bioknn_consecutive_iterations,
                          NeuralNetworkJFrame network_settings,
                          int network_iterations,
                          double network_acceptable_threshold,
                          int network_consecutive_iterations)
                   throws java.lang.Exception
Sets the basic fields of the NN_BioKNN_Ensemble.

NOTE: Some parameters may be ignored, based on the contents of the using_one_dimensional_features and using_multi_dimensional_features fields.

Throws an exception if both the use_one_dimensional and use_multi_dimensional parameters are false, if no taxonomy labels are specified, or if no features of the appropriate type are present in the feature_settings parameter.

Parameters:
identifier - An identifier that can be associated with the ensemble of classifiers so that outside classes can identify it.
possible_taxonomy_labels - The system will be trained to classify samples into one or more of these labels.
feature_settings - The features contained in the FeaturesPanel. The ones marked as Selection Candidates are used if the use_selection_candidates parameter is true and the ones marked as Overide Status are used otherwise.
use_selection_candidates - See feature_settings parameter.
ensemble_using_classifier_selection - Whether or not selection will be performed on the classifiers.
ensemble_using_classifier_weighting - Whether or not classifier weighting will be performed.
csw_training_fraction - Fraction of training samples to actually be used for training when performing feature selection and weighting of one-dimensional features. Must be between 0 and 1.
using_one_dimensional_features - Whether or not training and classification is performed using one-dimensional features (and therefore BioKNearestNeighbour classifiers.
using_multi_dimensional_features - Whether or not training and classification is performed using multi-dimensional features (and therefore FeedForwardNeuralNetwork classifiers.
ga_settings - Initialization settings for all genetic algorithms.
bioknn_using_feature_selection - Whether or not feature selection will be performed on one-dimensional features.
bioknn_using_feature_weighting - Whether or not variable feature weighting will be performed on one-dimensional features.
bioknn_iterations - The absolute value of this parameter indicates the maximum number of training iterations that the BioKNearestNeighbour will perform. A negative value means that the system will automaticall check after each iteratoin, based on the below two parameters, whether training should continue. A positive value means that the below two parameters are ignored.
classifier_vs_feature_multiplier - The value used to multiply the bioknn_iterations and population when using GAS for classifier selection or weighting rather than feature selection or weighting.
bioknn_acceptable_threshold - The error threshold below which BioKNearestNeighbour objects will cease training automatically.
bioknn_consecutive_iterations - The number of consecutiove iterations which must go by for which the error is below the above threshold for BioKNearestNeighbour objects in order for training to stop.
network_settings - Initialization settings for all neural networks.
network_iterations - The absolute value of this parameter indicates the maximum number of training iterations that the FeedForwardNeuralNetworks will perform. A negative value means that the system will automaticall check after each iteratoin, based on the below two parameters, whether training should continue. A positive value means that the below two parameters are ignored.
network_acceptable_threshold - The error threshold below which FeedForwardNeuralNetworks objects will cease training automatically.
network_consecutive_iterations - The number of consecutiove iterations which must go by for which the error is below the above threshold for FeedForwardNeuralNetworks objects in order for training to automatically stop.
Throws:
java.lang.Exception

NN_BioKNN_Ensemble

public NN_BioKNN_Ensemble(java.lang.String file_path,
                          GeneticAlgorithmJFrame ga_settings,
                          NeuralNetworkJFrame network_settings)
                   throws java.lang.Exception
Parse the file specified by the given file path to recreate the specificed trained classifier. Throws an exception if problems occur during parsing.

Parameters:
file_path - The file path of a NN_BioKNN_Ensemble_file that this instantiation is to be based on.
ga_settings - Initialization settings for all genetic algorithms.
network_settings - Initialization settings for all neural networks.
Throws:
java.lang.Exception
Method Detail

getClassifierName

public java.lang.String getClassifierName()
Returns the name of the type of classifier. This information can be used by external classes to identify the type of classifier in situations such as when the configuration of a trained classifier is being read from a file.


getClassifierParameters

public java.lang.String getClassifierParameters()
Returns a String describing the parameters of the classifier. This does not include training values, however.


getCategories

public java.lang.String[] getCategories()
Returns the categories into which this ensemble classifies test samples.


getNumberMultiDimFeatures

public int getNumberMultiDimFeatures()
Returns the number of multi-dimensional features present.


isTrained

public boolean isTrained()
                  throws java.lang.Exception
Returns whether or not the ensemble has been trained.

Throws:
java.lang.Exception

getEnsembleSelection

public boolean[] getEnsembleSelection()
                               throws java.lang.Exception
Returns the set of classifiers which survived feature selection (true). The names of the corresponding classifiers can be accessed with the getClassifierNames method.

An exception is thrown if the classifiers have not yet been trained.

Throws:
java.lang.Exception

getEnsembleWeightings

public double[] getEnsembleWeightings()
                               throws java.lang.Exception
Returns the weights for the set of classifiers after feature weighting. The names of the corresponding classifiers can be accessed with the getClassifierNames method.

An exception is thrown if the classifiers have not yet been trained.

Throws:
java.lang.Exception

getClassifierNames

public java.lang.String[] getClassifierNames()
                                      throws java.lang.Exception
Returns the names of the classifiers stored in this ensemble.

An exception is thrown if the classifiers have not yet been trained.

Throws:
java.lang.Exception

train

public double[][] train(Recording[] training_recordings,
                        java.lang.String[][] model_classifications,
                        ProgressBarTaskTrainMonitor training_monitor,
                        ClassificationResultsInterpereter results_interpereter)
                 throws java.lang.Exception
Trains the ensemble of classifiers using the given recordings and their model classifications. More specifically, the contents of the bio_knn_classifier and the neural_network_classifiers fields are instantiated and filled with the appropriate trained classifier(s). The following classifiers are produced and trained:

- A BioKNearestNeighbour that is fed all of the eligible one-dimensional features. This classifier is only created if the using_one_dimensional_features field is set to true.
- One FeedForwardNeuralNet for each multi-dimension feature. These classifiers are only created if the using_multi_dimensional_features field is set to true. These classifiers are created in the order that the elligible mult-dimensional features appear in the FeatureSettings that were passed to this object dureing instantiation. The networks are each given an identifier that consists of the name of the feature that they are associated with.

The training_recordings parameters should contain the recordings, with features already extracted, that will be used to train the classifier ensemble.

The model_classifications parameter specifies the model classifications for each of the recordings in the training_recordings parameter. The first indice identifies the related recording, and the second indice corresponds to the category(is) to which this recording should belong. These categories must be a subset of the possible_taxonomy_labels field, of course. Although the recordings do contain the category(ies) they belong to, this parameter is nonetheless needed in order to facilitate classifications at different levels of the taxonomical hierarchy.

The and fields are also calculated (i.e. classifier selection and classifying weighting are performed, if the user has selected these options.

The returned 2-D array of doubles is a set of classification errors after training iterations. The first indice indicates the classifier. Indice 0 corresponds to the BioKNearestNeighbour (null if not used) and the others, except for the last two, correspond to the FeedForwardNeuralNetworks in the order that they occur in the multi_dim_feature_labels field (not present if multi-dimensional features are not used). The second to last indice value gives the error for classification selection and the last indice value gives the error for classifier weighting (even if the ensemble_using_classifier_selection and ensemble_using_classifier_weighting fields are false). The second indice of the returned array corresponds to the iteration of training that the error is associated with.

The training_monitor parameter is used to monitor training progress. It may be null if it is not used.

The results_interpereter parameter is used to calculate some fitnesses with genetic algorithms. It may be null if the user wants to use alternative fitness measures not based on actual classifications.

A variety of informative exceptions are thrown if incompatible information is being used.

WARNING: due to the nature of the classifiers used, it is best to only train objects of this class once. In order to ensure this, the train method throws an exception if it is called and the classifier has already been trained.

Throws:
java.lang.Exception

classify

public double[][] classify(Recording[] test_recordings,
                           double[][][] individual_classifier_results)
                    throws java.lang.Exception
Returns the relative scores of each of the possible categories when the given recordings are classified. There is one entry in the returned array for each of the entries in the possible_taxonomy_labels field, and they appear in the same order. Higher scores in the returned array correspond to a greater certainty that the feature set should have the corresponding label. Scores should fall in the range between 0.0 and 1.0. The first indice of the returned array corresponds to the feature set and the second corresponds to the category. The order of the categories is the same as in the possible_taxonomy_labels field. These can be accessed with the getCategories method. The given scores are after classifier selection and weighting have been applied.

The test_recordings parameter specifies the recordings to be classified.

The individual_classifier_results parameter is not actually needed for classification. If the user wishes to see the scores arrived at by each classifier, then s/he should pass the following individual_classifier_results to this parameter:

double[][][] individual_classifier_results = new double[(classifier_ensemble.getEnsembleSelection()).length][][];

where classifier_ensemble is an object of this class. This method will then fill this array to hold the uncombined results of each classifier in the ensemble. The first indice specifies the classifier, the second specifies the recording and the third specifies the category. The entry for a classifier will be null if it has been eliminated during feature selection or if it is the one dimensional classifier and is not set to be used. The order of the classifiers can be accessed with the getClassifierNames method. A vaule of null can be passed to this parameter if the user does not care about individual classifier results.

An exception if thrown if the test_recordings are missing needed features or if the classifiers have not yet been trained.

Throws:
java.lang.Exception

getScaledOneDimensionalFeatureValues

public double[][] getScaledOneDimensionalFeatureValues(Recording[] recordings)
                                                throws java.lang.Exception
Returns an array that represents the values of the one-dimensional features in the recordings passed in the recordings parameter after having been scaled to fall between 0 and 1 based on previous training of the BioKNearestNeighbour.

The first indice of the returned array identifies the recording and the second identifies the feature. The order of the features in the returned array is the same as the order of the features in the recordings

The acual values of the features in recordings parameter are not themselves changed.

An exception if thrown if the recordings parameter contains unknown features or if the BioKNearestNeighbour has not yet been trained.

Throws:
java.lang.Exception

getMinFeatureValueCutoffs

public double[] getMinFeatureValueCutoffs()
                                   throws java.lang.Exception
Returns the minimum allowable value for each of the one-dimensional features in the trained BioKNearestNeighbour. Features below their respective values are rounded up to it. Throws an exception if the BioKNearestNeighbour has not been trained yet.

Throws:
java.lang.Exception

getMaxFeatureValueCutoffs

public double[] getMaxFeatureValueCutoffs()
                                   throws java.lang.Exception
Returns the maximum allowable value for each of the one-dimensional features in the trained BioKNearestNeighbour. Features above their respective values are rounded down to it. Throws an exception if the BioKNearestNeighbour has not been trained yet.

Throws:
java.lang.Exception

getOneDimensionalFeatureSelection

public boolean[] getOneDimensionalFeatureSelection()
Returns the features selected for use by the BioKNearestNeighbour classifier. A value of true means that the feature was selected and a value of false means that it was not. The getNamesOfFeaturesUsedByOneDimensionalClassifier method should be called to obtain the names of the features that the indices refer to.


getOneDimensionalFeatureWeightings

public double[] getOneDimensionalFeatureWeightings()
Returns the features weightings used by the BioKNearestNeighbour classifier. The getNamesOfFeaturesUsedByOneDimensionalClassifier method should be called to obtain the names of the features that the indices refer to.


getNamesOfFeaturesUsedByOneDimensionalClassifier

public java.lang.String[] getNamesOfFeaturesUsedByOneDimensionalClassifier()
Returns the names of the features used by the BioKNearestNeighbour classifier. in the order that they are used


save

public void save(java.io.File place_to_save)
          throws java.lang.Exception
Saves all of the fields except ga_settings and network_settings to the given File. A separate file is created for the bio_knn_classifier and a subfolder with separate files are created for neural_network_classifiers.

Throws:
java.lang.Exception