|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectbodhidharma.classifiers.SupervisedClassifier
bodhidharma.classifiers.bio_k_nearest_neighbour.BioKNearestNeighbour
public class BioKNearestNeighbour
An interface for using the k-nearest neighbour algorithm with feature selection and/or weighting performed with genetic alogrithms.
This classifier is trained by having the genetic algorithm choose the feature selection and/or weighting. The k-nearest neighbour algorithm only needs one iteration of training, of course. The "iterations" discussed in this class' description actually refer to generations of the genetic algorithm.
This classifier also calclulates scaling factors for all of the input feature values. This is done to ensure that all features fall into roughly the same range, so that they have roughly the same "weighting" before the genetic algorithm goes to work. The scaling factors are calculated during the first iteration of training and stored. They are then stored and applied to future unknown features that are input. They are not altered unless further training is performed.
WARNING: the scaling factors are only calculated the first time that this object is trained. Additional training will not cause recalculation of scaling factors. To do this, a new BioKNearestNeighbour must be instantiated.
This system uses exemplar based learning to classify arbitrary feature sets after training. This implementation makes it possible to assign more than one label to a single feature set.
Feature sets are fed into the classifier as arrays of doubles. Categories are specified as arrays of Strings.
Use the train
method to train the classifier (the
getFeatureNames
method is useful for deteriming the features that
can be fed to the classifer). This also calculates the feature value scaling
factors.
One constructor is provided for creating a new network and one is provided for parsing XML code and using it to reconstruct a trained network.
Use the classify
method to classify feature sets once training
has been completed (the getCategories
method is useful for
determining what categories feature sets can be classified into and for
determining the order of the categories when parsing classification results).
Use the save
method to save the classifier and its current
state to disk.
Use the getClassifierName
and getClassifierParameters
methods to obtain information about the classifier.
Use the getClassifierIdentifier
method to get a name or code that
was given to an instantiation of the classifier when it was constructed. This
identifier can be used by external classes to identify the instantiation.
Use the getScaledFeatureValues
method to find out what the values
of a set of feature values would be after scaling.
KNN
,
Breeder
,
FeatureSelectionEvaluator
,
FeeatureWeightingEvaluator
Field Summary |
---|
Fields inherited from class bodhidharma.classifiers.SupervisedClassifier |
---|
categories, feature_names, identifier, training_monitor |
Constructor Summary | |
---|---|
BioKNearestNeighbour(java.lang.String file_path)
Parse the file specified by the given file path to recreate the specificed trained classifier. |
|
BioKNearestNeighbour(java.lang.String[] feature_names,
java.lang.String[] categories,
java.lang.String identifier,
boolean using_feature_selection,
boolean using_feature_weighting,
GeneticAlgorithmJFrame gen_alg_settings,
double fsw_training_fraction,
ClassificationResultsInterpereter results_interpereter)
Generate a BioKNearestNeighbour with the given parameters and randomly. |
Method Summary | |
---|---|
double[][] |
classify(double[][] feature_sets,
java.lang.String[] feature_labels)
Returns the relative scores of each of the possible categories when the given sets of features are classified. |
java.lang.String |
getClassifierName()
Returns the name of the type of classifier. |
java.lang.String |
getClassifierParameters()
Returns a String describing the parameters of the classifier. |
boolean[] |
getFeatureSelection()
Returns a copy of the the array indicating whether or not a given feature is to be used for classification. |
double[] |
getFeatureWeights()
Returns a copy of the the array indicating the feature weights that are to be used for classification. |
double[] |
getMaxFeatureValueCutoffs()
Returns the maximum allowable value for each of the features. |
double[] |
getMinFeatureValueCutoffs()
Returns the minimum allowable value for each of the features. |
double[][] |
getScaledFeatureValues(double[][] feature_sets,
java.lang.String[] feature_labels)
Returns an array that represents the features_to_scale parameter with values scaled to fall between 0 and 1 based on previous training of the BioKNearestNeighbour. |
void |
save(java.io.File place_to_save)
Saves all of the fields to the given file. |
double[] |
train(double[][] feature_sets,
java.lang.String[] feature_labels,
java.lang.String[][] model_categories,
int iterations,
double acceptable_threshold,
int consecutive_iterations)
Trains the BioKNearestNeighbour using the given feature sets. |
Methods inherited from class bodhidharma.classifiers.SupervisedClassifier |
---|
getCategories, getClassifierIdentifier, getFeatureNames, getModelResults, getOrderedFeatureSets, setTrainingMonitor |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BioKNearestNeighbour(java.lang.String[] feature_names, java.lang.String[] categories, java.lang.String identifier, boolean using_feature_selection, boolean using_feature_weighting, GeneticAlgorithmJFrame gen_alg_settings, double fsw_training_fraction, ClassificationResultsInterpereter results_interpereter)
feature_names
- The names of the features that classifications will be based on.categories
- The names of categories that correspond to the possible classification results.identifier
- An identifier that can be associated with the classifier so that outside classes can identify it.using_feature_selection
- Whether or not feature selection will be performed.using_feature_weighting
- Whether or not variable feature weighting will be performed.gen_alg_settings
- Initialization settings for the genetic algorithm.fsw_training_fraction
- Fraction of training samples to actually be used for training when feature selection and weighting are calculated. Must be between 0 and 1.training_monitor
- Used to monitor training progress. It may be null if it is not used.results_interpereter
- Used to calculate fitnesses with genetic algorithms. It may be null if the user wants to use alternative fitness measures not based on actual classifications.public BioKNearestNeighbour(java.lang.String file_path) throws java.lang.Exception
file_path
- The file path of a BioKNearestNeighbour_file that this instantiation is to be based on.
java.lang.Exception
Method Detail |
---|
public java.lang.String getClassifierName()
getClassifierName
in class SupervisedClassifier
public java.lang.String getClassifierParameters()
getClassifierParameters
in class SupervisedClassifier
public boolean[] getFeatureSelection()
train
method. True means that the feature is to be used and false
that it isn't. A value of null is returned if feature selection has not been
performed, and all features are to be used.
public double[] getFeatureWeights()
train
method. Values are between 0 and 1, with 1 meaning that the
feature is very important relative to others. A value of null is returned if
feature weighting has not been performed, and all weights are defaulted to 1.
public void save(java.io.File place_to_save) throws java.lang.Exception
save
in class SupervisedClassifier
java.lang.Exception
public double[] train(double[][] feature_sets, java.lang.String[] feature_labels, java.lang.String[][] model_categories, int iterations, double acceptable_threshold, int consecutive_iterations) throws java.lang.Exception
BioKNearestNeighbour
using the given feature sets.
The training done by this mehtod consists of storing the training points
in the KNN
classifier and finding good feature selection and
feature weighting settings (if these two options were set in the constructor).
NOTE: It should be noted that feature weighting is done using the features selection calculated during the feature selection portion of training, if the feature selection option is selected.
The first indice of the feature_sets parameter corresponds to different feature sets. The second indice corresponds to different features in the given featue set. It should be noted that all feature sets must use the same features in the same order as given in the feature_labels parameter.
The feature_labels parameter specifies the names of each of the features in the feature_sets parameter. The features in the feature_sets parameter will automatically be matched to the features in the feature_names field based on the content of the feature_labels parameter unless a value of null is passed to the feature_labels. In this case, the feature values in the feature_sets parameter will simply be fed into the classifier in the order that they occur.
The model_categories parameter gives the categories of each of the given feature sets. The first indice corresponds to the feature set. The second indice corresponds to different model categories for the given feature set. Only categories to which the feature set belongs should be included.
The iterations parameter specifies the number of training iterations performed. If a negative value is passed here, then the number of iterations to perform is calculated automatically based on the acceptable_threshold parameter, which specifies the absolute rate of change of the training error below which training will stop, and the consecutive_iterations parameter, which specifies the number of consecutive iterations for which the rate of change must be below this threshold in order for training to stop. The number of iterations that go by will never exceed the absolute valud of the iterations value, irregardless of the other parameters.
For example, if a value of 1000 is given for iterations, then 1000 iterations will be performed irregardless of the other parameters. If a value of -1000 is given, then training will automatically stop if the absolute value of the rate of change of the training error from one sample to the next falls below the acceptable_threshold parameter for consecutive_iterations iterations, but no more than 1000 iterations will be performed in any case.
The returned double is an average error after training iterations. The indice of the returned array corresponds to the iteration of training that the error is associated with. This is the data for feature weighting if this option was selected, the data for feature selection if this option was selected but feature weighting was not and null otherwise (since basic KNN does not need training iteration.
This method also calclulates scaling factors for all of the input feature values. This is done to ensure that all features fall into roughly the same range, so that they have roughly the same "weighting" before the genetic algorithm goes to work. The scaling factors are calculated during the first iteration of training and stored. They are then stored and applied to future unknown features that are input. They are not altered unless further training is performed.
WARNING: the scaling factors are only calculated the first time that this object is trained. Additional training will not cause recalculation of scaling factors. To do this, a new BioKNearestNeighbour must be instantiated. Also, each retraining will reset feature selection and feature weighting settings, if these have been found during earlier trainings.
An exception if thrown if the feature_labels do not contain the same names as feature_names (although a different ordering is permitted) or if any of the feature sets in feature_sets have a different number of features than feature_names. An exception is also thrown if feature_sets and model_categories have different sizes in regard to their first parameters. An exception is also thrown if the given_results parameter contains a name not present in the categories field or if it contains the same category more than once. An exception is also thrown if there are problems during evolution.
train
in class SupervisedClassifier
java.lang.Exception
public double[][] classify(double[][] feature_sets, java.lang.String[] feature_labels) throws java.lang.Exception
The feature_sets parameter specifies the feature sets to be classified. The first indice corresponds to different feature sets. The second indice corresponds to different features in the given featue set. It should be noted that all feature sets must use the same features in the same order as given in the feature_labels parameter.
The feature_labels parameter specifies the names of each of the features in the feature_sets parameter. The features in the feature_sets parameter will automatically matched to the features in the feature_names field based on the content of the feature_labels parameter unless a value of null is passed to the feature_labels. In this case, the feature values in the feature_sets parameter will simply be fed into the classifier in the order that they occur.
An exception if thrown if the feature_labels do not contain the same names as feature_names (although a different ordering is permitted) or if any of the feature sets in feature_sets have a different number of features than feature_names. Also throws an exception if the KNN classifier is untrained.
classify
in class SupervisedClassifier
java.lang.Exception
public double[][] getScaledFeatureValues(double[][] feature_sets, java.lang.String[] feature_labels) throws java.lang.Exception
The feature_labels parameter specifies the names of each of the features in the feature_sets parameter. The features in the feature_sets parameter will automatically matched to the features in the feature_names field based on the content of the feature_labels parameter.
The acual values of the feature_sets parameter are not themselves changed.
An exception if thrown if the feature_labels do not contain the same names as feature_names (although a different ordering is permitted) or if any of the feature sets in feature_sets have a different number of features than feature_names. Also throws an exception if the scaling factors have not yet been calculated.
java.lang.Exception
public double[] getMinFeatureValueCutoffs() throws java.lang.Exception
java.lang.Exception
public double[] getMaxFeatureValueCutoffs() throws java.lang.Exception
java.lang.Exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |