bodhidharma.classifiers.bio_k_nearest_neighbour
Class KNN

java.lang.Object
  extended by bodhidharma.classifiers.bio_k_nearest_neighbour.KNN

public class KNN
extends java.lang.Object

Uses the k-nearest neighbour algorithm to classify feature sets. k is automatically set as the square root of the number of training samples.

The features_to_use and feature_weights fields can be set by external functions in order to set feature selection and feature weighting settings respectively.

The addTrainingSamples method is used to train the classifier and the classifyTestSample method is used to classify test points. The getK method reveals the k used.

The save method saves the current state of a KNN object to disk.

Training examples can be used that belong to more than one category.

Author:
Cory McKay

Field Summary
 double[] feature_weights
          The weights used for calculating the distance for each feature.
 boolean[] features_to_use
          Whether or not a given feature is to be used when calculating the distance metric.
 
Constructor Summary
KNN(int how_many_categories)
          Basic constructor.
KNN(java.lang.String file_path)
          Parse the file specified by the given file path to recreate the specificed trained classifier.
 
Method Summary
 void addTrainingSamples(double[][] training_data, int[][] model_categories)
          Stores the provided feature sets as training data.
 double[] classifyTestSample(double[] test_point)
          Returns an array of integers indicating the number of training samples that are included in the k points closest to the test_point for each category that the system has been trained with.
 int getK()
          Get the value of k used by the KNN classifier.
 int getNumberCategories()
          Get the number of categories into which test sample can be classified.
 void save(java.io.File place_to_save)
          Saves all of the fields to the given file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

features_to_use

public boolean[] features_to_use
Whether or not a given feature is to be used when calculating the distance metric. Indices correspond to the second indices of the training samples field. A value of null indicates that all features should be used. Used for feature selection.


feature_weights

public double[] feature_weights
The weights used for calculating the distance for each feature. Between 0 and 1. Indices correspond to the second indices of the training samples field. A value of null means that all features have a weighting of 1.

Constructor Detail

KNN

public KNN(int how_many_categories)
Basic constructor. Initializes system so that all features will be used in classification with an equal weighting of 1.

The parameter identifies how many categories the system will be categorizing test points into.


KNN

public KNN(java.lang.String file_path)
    throws java.lang.Exception
Parse the file specified by the given file path to recreate the specificed trained classifier. Throws an exception if problems occur during parsing.

Parameters:
file_path - The file path of a KNN_file that this instantiation is to be based on.
Throws:
java.lang.Exception
Method Detail

addTrainingSamples

public void addTrainingSamples(double[][] training_data,
                               int[][] model_categories)
                        throws java.lang.Exception
Stores the provided feature sets as training data. The first indice of the training_data parameter identifies the sample and the second indice identifies the feature. The first indice of the model_categories parameter identifies the training sample and the second indice identifies the number of the category. The entries of model_categories corresponds to the code assigned to the category(ies) that each sample belongs to. This code must be 0 or above and less than the how_many_categories number passed to the constructor. An exception is thrown if these conditions are not fulfilled. If this object has previously been trained, it is the calling object's responsibility that the categories used in the past and current trainings are the same.

Training examples can be used that belong to more than one category.

It is assumed that all of the samples in the training_data parameter have the same features in the same order. The calling object must ensure that this is true. An exception is thrown if any of the features in training_data have a different number of features, either with each other or with any existing training samples.

If this object has previously been trained with feature sets, these new feature sets will be added to the existing ones.

k is automatically set to the square root of the cumulative number of training samples that have been used to train the classifier so far.

WARNING: Care should be taken not to include samples that have previously been provided to the classifier, as this will cause the sample to be double counted if it falls within k.

Throws:
java.lang.Exception

classifyTestSample

public double[] classifyTestSample(double[] test_point)
                            throws java.lang.Exception
Returns an array of integers indicating the number of training samples that are included in the k points closest to the test_point for each category that the system has been trained with. The indice of the returned array identifies the category and the corresponding entry gives the number of points included in k for that category. Note that dividing by k will not necessarily result in normalized results, since training categories can belong to more than one category.

The test_point parameter specifies the features of a sample that is to be classified. The features stored in the test_point and their order must correspond to that used during training with the addTrainingSamples method.

Uses the last features selection and/or feature weightings that have been specified (the default is to use all features with equal weightings of 1.

An exception is thrown if the test_point has a different number of features than the data that was used to train the classifier.

Throws:
java.lang.Exception

getK

public int getK()
Get the value of k used by the KNN classifier.


getNumberCategories

public int getNumberCategories()
Get the number of categories into which test sample can be classified.


save

public void save(java.io.File place_to_save)
          throws java.lang.Exception
Saves all of the fields to the given file. Throws an exception if a problem occurs.

Throws:
java.lang.Exception