bodhidharma.classifiers
Class SupervisedClassifier

java.lang.Object
  extended by bodhidharma.classifiers.SupervisedClassifier
Direct Known Subclasses:
BioKNearestNeighbour, FeedForwardNeuralNetwork

public abstract class SupervisedClassifier
extends java.lang.Object

An abstract class for designing a classifier that used supervised learning in order to classify arbitrary feature sets. This implementation makes it possible to assign more than one label to a single feature set.

Feature sets are fed into the classifier as arrays of doubles. Categories are specified as arrays of Strings.

All classes implementing this interface should include a constructor that allows the parameters of particular classifiers to be set. This constructor should also set the categories, feature_names and identifier fields.

All classes implementing this interface should also include a constructor that only takes in a String holding the file path of an XML file holding the information necessary to reconstruct a trained classifier.

Use the train method to train the classifier (the getFeatureNames method is useful for deteriming the features that can be fed to the classifer).

Use the classify method to classify feature sets once training has been completed (the getCategories method is useful for determining what categories feature sets can be classified into and for determining the order of the categories when parsing classification results).

Use the save method to save the classifier and its current state to disk.

Use the getClassifierName and getClassifierParameters methods to obtain information about the classifier.

Use the getClassifierIdentifier method to get a name or code that was given to an instantiation of the classifier when it was constructed. This identifier can be used by external classes to identify the instantiation.

Author:
Cory McKay

Field Summary
protected  java.lang.String[] categories
          The possible categories into which feature sets can be classified.
protected  java.lang.String[] feature_names
          The names of the different features which are used to perform classifications.
protected  java.lang.String identifier
          An identifier that can be associated with the classifier so that outside classes can identify it.
protected  ProgressBarTaskTrainMonitor training_monitor
          Used to monitor training progress.
 
Constructor Summary
SupervisedClassifier()
           
 
Method Summary
abstract  double[][] classify(double[][] feature_sets, java.lang.String[] feature_labels)
          Returns the relative scores of each of the possible categories when the given sets of features are classified.
 java.lang.String[] getCategories()
          Returns the contents of the categories field.
 java.lang.String getClassifierIdentifier()
          Returns a name or code that was given to this instantiation of the classifier when it was constructed.
abstract  java.lang.String getClassifierName()
          Returns the name of the type of classifier.
abstract  java.lang.String getClassifierParameters()
          Returns a String describing the parameters of the classifier.
 java.lang.String[] getFeatureNames()
          Returns the contents of the feature_names field.
protected  boolean[][] getModelResults(java.lang.String[][] given_results)
          Returns a 2-D array whose first indice corresponds to the feature sets specified by the first indice of the given_results parameter and whose second indice corresponds to each of the categories in the categories field.
protected  double[][] getOrderedFeatureSets(double[][] feature_sets, java.lang.String[] feature_labels)
          Returns a 2-D array of doubles that consists of the contents of the feature_sets parameter after having been reordered so that the order of the features (as specified in the feature_labels parameter) are the same as in the feature_names field.
abstract  void save(java.io.File place_to_save)
          Writes the SupervisedClassifier and its current state to the given save_file.
 void setTrainingMonitor(ProgressBarTaskTrainMonitor monitor)
          Sets the training_monitor parameter to the given value.
abstract  double[] train(double[][] feature_sets, java.lang.String[] feature_labels, java.lang.String[][] model_categories, int iterations, double acceptable_threshold, int consecutive_iterations)
          Trains the SupervisedClassifier using the given feature sets.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

categories

protected java.lang.String[] categories
The possible categories into which feature sets can be classified.


feature_names

protected java.lang.String[] feature_names
The names of the different features which are used to perform classifications.


identifier

protected java.lang.String identifier
An identifier that can be associated with the classifier so that outside classes can identify it.


training_monitor

protected ProgressBarTaskTrainMonitor training_monitor
Used to monitor training progress. It may be null if it is not used.

Constructor Detail

SupervisedClassifier

public SupervisedClassifier()
Method Detail

getCategories

public java.lang.String[] getCategories()
Returns the contents of the categories field.


getFeatureNames

public java.lang.String[] getFeatureNames()
Returns the contents of the feature_names field.


getClassifierIdentifier

public java.lang.String getClassifierIdentifier()
Returns a name or code that was given to this instantiation of the classifier when it was constructed. This identifier can be used by external classes to identify this instantiation.


getClassifierName

public abstract java.lang.String getClassifierName()
Returns the name of the type of classifier. This information can be used by external classes to identify the type of classifier in situations such as when the configuration of a trained classifier is being read from a file.


getClassifierParameters

public abstract java.lang.String getClassifierParameters()
Returns a String describing the parameters of the classifier. This does not include training values, however.


setTrainingMonitor

public void setTrainingMonitor(ProgressBarTaskTrainMonitor monitor)
Sets the training_monitor parameter to the given value.


train

public abstract double[] train(double[][] feature_sets,
                               java.lang.String[] feature_labels,
                               java.lang.String[][] model_categories,
                               int iterations,
                               double acceptable_threshold,
                               int consecutive_iterations)
                        throws java.lang.Exception
Trains the SupervisedClassifier using the given feature sets. The first indice of the feature_sets parameter corresponds to different feature sets. The second indice corresponds to different features in the given featue set. It should be noted that all feature sets must use the same features in the same order as given in the feature_labels parameter.

The feature_labels parameter specifies the names of each of the features in the feature_sets parameter. The features in the feature_sets parameter will automatically matched to the features in the feature_names field based on the content of the feature_labels parameter unless a value of null is passed to the feature_labels. In this case, the feature values in the feature_sets parameter will simply be fed into the classifier in the order that they occur.

The model_categories parameter gives the categorie(s) of each of the given feature sets. The first indice corresponds to the feature set. The second indice corresponds to different possible categories for the given feature set. Only categories to which the feature set belongs should be included.

The iterations parameter specifies the number of training iterations performed. If a negative value is passed here, then the number of iterations to perform is calculated automatically based on the acceptable_threshold parameter, which specifies the absolute rate of change of the classification error below which training will stop, and the consecutive_iterations parameter, which specifies the number of consecutive iterations for which the rate of change must be below this threshold in order for training to stop. The number of iterations that go by will never exceed the absolute valud of the iterations value, irregardless of the other parameters.

For example, if a value of 1000 is given for iterations, then 1000 iterations will be performed irregardless of the other parameters. If a value of -1000 is given, then training will automatically stop if the absolute value of the rate of change of the classification error from one sample to the next falls below the acceptable_threshold parameter for consecutive_iterations iterations, but no more than 1000 iterations will be performed in any case.

NOTE: the last three parameters are ignored if a type of classifier is used that does not use more than one iteration.

NOTE: the way in which the classification error is calculated varies from implementation to implementation, but 0 is always perfect performance, and rising levels indicate poorer performance.

The returned double is a set of classification error after training iterations. The indice of the returned array corresponds to the iteration of training that the error is associated with. A value of null is returned if the classifier does not provide this information.

An exception if thrown if the feature_labels do not contain the same names as feature_names (although a different ordering is permitted) or if any of the feature sets in feature_sets have a different number of features than feature_names. An exception is also thrown if feature_sets and model_categories have different sizes in regard to their first parameters. An exception is also thrown if the given_results parameter contains a name not present in the categories field or if it contains the same category more than once.

Throws:
java.lang.Exception

classify

public abstract double[][] classify(double[][] feature_sets,
                                    java.lang.String[] feature_labels)
                             throws java.lang.Exception
Returns the relative scores of each of the possible categories when the given sets of features are classified. There is one entry in the returned array for each of the entries in the categories field, and they appear in the same order. Higher scores in the returned array correspond to a greater certainty that the feature set should have the corresponding label. Scores should fall in the range between 0.0 and 1.0. The first indice of the returned array corresponds to the feature set and the second corresponds to the category. The order of the categories is the same as in the categories field and the order of the feature sets is the same as the order in which they were passed to the feature_sets parameter.

The feature_sets parameter specifies the feature sets to be classified. The first indice corresponds to different feature sets. The second indice corresponds to different features in the given featue set. It should be noted that all feature sets must use the same features in the same order as given in the feature_labels parameter.

The feature_labels parameter specifies the names of each of the features in the feature_sets parameter. The features in the feature_sets parameter will automatically matched to the features in the feature_names field based on the content of the feature_labels parameter unless a value of null is passed to the feature_labels. In this case, the feature values in the feature_sets parameter will simply be fed into the classifier in the order that they occur.

An exception if thrown if the feature_labels do not contain the same names as feature_names (although a different ordering is permitted) or if any of the feature sets in feature_sets have a different number of features than feature_names.

Throws:
java.lang.Exception

save

public abstract void save(java.io.File place_to_save)
                   throws java.lang.Exception
Writes the SupervisedClassifier and its current state to the given save_file. If the particular SupervisedClassifier needs to write more than one file, then a directory should be passed to this method's parameter, where the appropriate files will be written. Throws an exception if a problem occurs during saving.

Throws:
java.lang.Exception

getOrderedFeatureSets

protected double[][] getOrderedFeatureSets(double[][] feature_sets,
                                           java.lang.String[] feature_labels)
                                    throws java.lang.Exception
Returns a 2-D array of doubles that consists of the contents of the feature_sets parameter after having been reordered so that the order of the features (as specified in the feature_labels parameter) are the same as in the feature_names field. In other words, this method reorders the the contents referred to by the second indice of feature_sets in a copy of feature_sets. The order of the feature sets themselves (first parameter) is not changed.

An exception is thrown if the feature_labels parameter contains any labels not found in the feature_names field or if there are any labels found in the feature_names field that are not in the feature_labels parameter. The thrown exception contains a description of the problem encountered. An exception is also thrown if any of the feature sets contain a number of features different from the number of feature labels.

This method is intended for use by the train and classify methods.

The arrays passed as parameters are not altered.

Throws:
java.lang.Exception

getModelResults

protected boolean[][] getModelResults(java.lang.String[][] given_results)
                               throws java.lang.Exception
Returns a 2-D array whose first indice corresponds to the feature sets specified by the first indice of the given_results parameter and whose second indice corresponds to each of the categories in the categories field. Entries are set to true if the given feature set belongs to the given category and to false if they do not.

This method is intended for use by the train method to process its model_categories parameter, which is passed to the given_results parameter of this method. The given_results parameter consists of feature sets (first indice) and the category names (second parameter) that each feature set belongs to.

An exception is thrown if the given_results parameter contains a name not present in the categories field or if it contains the same category more than once.

Throws:
java.lang.Exception