ace
Class InstanceClassifier

java.lang.Object
  extended by ace.InstanceClassifier

public class InstanceClassifier
extends java.lang.Object

Classifies a set of Weka Instances using a trained Weka Classifier of a TrainedModel object.


Constructor Summary
InstanceClassifier()
           
 
Method Summary
static SegmentedClassification[] classify(TrainedModel trained, DataBoard data_board, weka.core.Instances instances, java.lang.String results_file, boolean save_intermediate_arffs)
          Classify a set of instances using a trained Weka Classifier.
static weka.core.Instances classifyInstances(TrainedModel trained, weka.core.Instances instances, boolean save_intermediate_arffs)
          Classifies a set of Weka Instances.
static java.lang.String formatConfusionMatrix(double[][] matrix, java.lang.String[] classes)
          Creates an easily readable version of a confusion matrix to be included in results output.
static double[][] getConfusionMatrix(weka.core.Instances model, weka.core.Instances classified, java.lang.String[] classes)
          Gets the confusion matrix for a set of classified Instances.
static double getCorrectCount(weka.core.Instances models, weka.core.Instances results)
          Compares the given classifications with the given model classifications and returns number of correct classifications.
static double[] getSuccessRate(SegmentedClassification[] models, SegmentedClassification[] results, java.lang.StringBuffer out)
          Gets the number of correct classifications for overall instances and subsections and appends the results of each classification to a given StringBuffer object.
static java.lang.String getSuccessString(SegmentedClassification[] models, SegmentedClassification[] results, java.lang.StringBuffer out)
          Gets a String describing the success rate of a classification.
protected static java.lang.String num2ShortID(int num, char[] IDChars, int IDWidth)
          Method for generating indices for the confusion matrix.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InstanceClassifier

public InstanceClassifier()
Method Detail

classify

public static SegmentedClassification[] classify(TrainedModel trained,
                                                 DataBoard data_board,
                                                 weka.core.Instances instances,
                                                 java.lang.String results_file,
                                                 boolean save_intermediate_arffs)
                                          throws java.lang.Exception
Classify a set of instances using a trained Weka Classifier.

Parameters:
trained - Object containing references to the Weka objects needed for Classification, including a trained Weka Classifier.
data_board - Contains instances to classify and method to perform classification.
instances - The Weka Instances to classify.
save_intermediate_arffs - Whether or not to save testing data to an arff file after parsing and again after feature selection, if any. Useful for testing.
results_file - The name of the file to which the classification results will be stored. Should have extention ".xml" if using ACE XML files or ".arff" if using Weka ARFF files. May be null if no file is to be saved.
Returns:
The classifications of each instance.
Throws:
java.lang.Exception - If an error is encountered.

classifyInstances

public static weka.core.Instances classifyInstances(TrainedModel trained,
                                                    weka.core.Instances instances,
                                                    boolean save_intermediate_arffs)
                                             throws java.lang.Exception
Classifies a set of Weka Instances. Returns a classified copy of the given Instances. This method is used in the context of cross validation when only Weka Instances objects are used to store the instances and never ACE datatypes like SegmentedClassification and DataSet. Note that dimensionality reduction (if any) has already been applied to the instances prior to being passed to this method.

Parameters:
trained - Object containing references to the Weka objects needed for Classification, including a trained Weka Classifier.
instances - The Weka Instances to classify.
save_intermediate_arffs - Whether or not to save testing data to an arff file after parsing and again after feature selection, if any. Useful for testing.
Returns:
A classified copy of the given Instances.
Throws:
java.lang.Exception - If an error occurs.

getConfusionMatrix

public static double[][] getConfusionMatrix(weka.core.Instances model,
                                            weka.core.Instances classified,
                                            java.lang.String[] classes)
Gets the confusion matrix for a set of classified Instances. Compares the model classifications to the classifications made by a trained Weka Classifier.

Parameters:
model - The original Instances that were used for testing.
classified - The classified Instances to be evaluated.
classes - The possible classes into which an Instance may be classified.
Returns:
Table representing the correct and incorrect classifications of this classification.

formatConfusionMatrix

public static java.lang.String formatConfusionMatrix(double[][] matrix,
                                                     java.lang.String[] classes)
Creates an easily readable version of a confusion matrix to be included in results output.

Parameters:
matrix - Confusion matrix for a classification. Table representing correct and incorrect classifications.
classes - The possible classes into which an instance may be classified.
Returns:
Easily readable table representing the correct and incorrect classifications.

num2ShortID

protected static java.lang.String num2ShortID(int num,
                                              char[] IDChars,
                                              int IDWidth)
Method for generating indices for the confusion matrix.

Parameters:
num - integer to format
IDChars - the characters to use
IDWidth - the width of the entry
Returns:
the formatted integer as a string

getCorrectCount

public static double getCorrectCount(weka.core.Instances models,
                                     weka.core.Instances results)
Compares the given classifications with the given model classifications and returns number of correct classifications.

If an instance belongs to multiple classes in its model classifications, and only a fraction of these are found, then the calculation of the overall success rate will treat this as fractionally succesful.

Parameters:
models - The model classifications.
results - The classifications to compare to the models.
Returns:
The number of correct classifications.

getSuccessString

public static java.lang.String getSuccessString(SegmentedClassification[] models,
                                                SegmentedClassification[] results,
                                                java.lang.StringBuffer out)
Gets a String describing the success rate of a classification. Compares the given classifications with the given model classifications and returns a string describing the success rates for overall instances and/or sections of instances, whichever is appropriate for the given data.

Parameters:
models - The model classifications from the original instances.
results - The classifications predicted by the trained Classifier.
out - StringBuffer that will be passed to getSuccessRate and to which the results of individual classifications will be printed.
Returns:
String describing the success rate of a classification.

getSuccessRate

public static double[] getSuccessRate(SegmentedClassification[] models,
                                      SegmentedClassification[] results,
                                      java.lang.StringBuffer out)
Gets the number of correct classifications for overall instances and subsections and appends the results of each classification to a given StringBuffer object. Misclassified instances will be preceded by an asterisk(*). Partially misclassified instances will be preceded by a caret(^).

If an instance belongs to multiple classes in its model classifications, and only a fraction of these are found, then the calculation of the overall success rate will treat this as fractionally succesful.

The reported value for error rate includes wrong classifications as well as additional classifications beyond the correct ones (sincce a given instance may have an arbitrary number of correct classes).

Parameters:
models - The model classifications.
results - The classifications to compare to the models.
out - The StringBuffer to which results of the classification of each individual instance will be printed. Misclassifications will be preceded by an asterisk(*) and partially misclassified instances will be preceeded by a caret(^).
Returns:
A array of 3 doubles. First cell is number of correct classifications of overall instances. Second cell is number of correct classifications of subsections. 3rd cell contains total number of instances to be used during calculation of success rate (instances without model classifications are not included). Number of correct classifications is calculated as a score taking into account multiple classifications for single instances and overlapping sections.