ace
Class Coordinator

java.lang.Object
  extended by ace.Coordinator

public class Coordinator
extends java.lang.Object

Coordinates ACE's main functionality.

This class allows for easy access to all of ACE's main functionality from any source. The graphic user interface, the command line interface, and external APIs are all directed through this class to access ACE's training, classification, cross validation, and experiemtnation functionality.


Field Summary
 java.lang.String arff_path
          The path of an arff file from which to derive instances.
 DataBoard data_board
          The instances to work with.
 weka.core.Instances instances
          A set of Weka Instances instantiated by a call to loadInstances.
 boolean save_intermediate_arffs
          Whether or not to save training data to an arff file after parsing, after thinning and, and again after feature selection, if any.
 
Constructor Summary
Coordinator(DataBoard data_board, java.lang.String arff_path, boolean save_intermediate_arffs)
          Constructs an instance of a Coordinator object.
 
Method Summary
 SegmentedClassification[] classify(java.lang.String results_file, TrainedModel trained)
          Classifies a set of instances with the given trained classifier.
 java.lang.String crossValidate(double max_class_membership_spread, double max_class_membership_count, boolean order_randomly, java.lang.String file_name, java.lang.String classifier_type, java.lang.String feature_selector, int folds, int max_attribute, java.io.OutputStream out, boolean verbose)
          Cross validates a set of Weka Instances.
 java.lang.String experiment(double max_class_membership_spread, double max_class_membership_count, boolean order_randomly, java.lang.String results_base_file_name, int folds, java.io.OutputStream out, boolean verbose, int max_attribute)
          Experiments on a set of Weka Instances.
static weka.classifiers.Classifier[] getAllUntrainedClassifiers(java.util.LinkedList<java.lang.String> classifier_descriptions)
          Return an array of untrained but parameterized classifiers that can be used for a variety of purposes.
static weka.classifiers.Classifier getOneUntrainedClassifier(java.lang.String classifier_type, java.lang.String[] description)
          Prepares a single Weka Classifier.
 void loadInstances(java.lang.String relation)
          Loads the Instances from either ACE XML files or a Weka ARFF file.
 java.lang.String performDimensionalityReduction(TrainedModel trained, java.lang.String feature_selector, int max_attribute, java.io.OutputStream out, boolean verbose)
          Performs dimensionality reduction on a set of Weka Instances.
 void prepareTrainingInstances(double max_class_membership_spread, double max_class_membership_count, boolean order_randomly)
          Prepares a set of Weka Instances for from either an arff file or ACE XML files.
 TrainedModel train(double max_class_membership_spread, double max_class_membership_count, boolean order_randomly, java.lang.String feature_selector, java.lang.String classifier_type, java.io.OutputStream out, int max_attribute, boolean verbose)
          Trains a Weka Classifier based on a set of sample Instances.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

data_board

public DataBoard data_board
The instances to work with.


arff_path

public java.lang.String arff_path
The path of an arff file from which to derive instances. This will be null if using ACE XML files. Note that it is assumed that the class attribute is the last attribute.


instances

public weka.core.Instances instances
A set of Weka Instances instantiated by a call to loadInstances. All processing methods will load instances into this field prior to applying anad filters, restrictions, or dimensionality reduction and before passing them to ACE's processing classes.


save_intermediate_arffs

public boolean save_intermediate_arffs
Whether or not to save training data to an arff file after parsing, after thinning and, and again after feature selection, if any. Useful for testing.

Constructor Detail

Coordinator

public Coordinator(DataBoard data_board,
                   java.lang.String arff_path,
                   boolean save_intermediate_arffs)
Constructs an instance of a Coordinator object.

Parameters:
data_board - The instances to work with.
arff_path - The path of an arff file to derive instances from. This will be null if using ACE XML files. Note that it is assumed that the class attribute is the last attribute.
save_intermediate_arffs - Whether or not to save training data to an arff file after parsing, after thinning and, and again after feature selection, if any. Useful for testing.
Method Detail

train

public TrainedModel train(double max_class_membership_spread,
                          double max_class_membership_count,
                          boolean order_randomly,
                          java.lang.String feature_selector,
                          java.lang.String classifier_type,
                          java.io.OutputStream out,
                          int max_attribute,
                          boolean verbose)
                   throws java.lang.Exception
Trains a Weka Classifier based on a set of sample Instances.

NOTE: for training_classifier_type, the codes for the types of Weka classifiers are as follows:

  • Unweighted k-nn (k = 1): IBk
  • Naive Bayesian (Gaussian): NaiveBayes
  • Support Vector Machine: SMO
  • C4.5 Decision Tree: J48
  • Backdrop Neural Network: MultilayerPerceptron
  • AdaBoost seeded with C4.5 Decision Trees: AdaBoostM1
  • Bagging seeded with C4.5 Decision Trees: Bagging

    Parameters:
    max_class_membership_spread - The maximum ratio of instances that are permitted belonging to different classes. For example, a value of 2 means that only up to twice the number of instances belonging to the class with the smallest number of training instances may be used for training. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training and all others are eliminated. A value of 0 means that no maximum spread is enforced and a value of 1 enforces a unifrom distribution. Instances may be reorderd.
    max_class_membership_count - The maximum number of instances that may belong to any one class. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training, and all others are eliminated. A value of 0 means that no maximum is enforced.
    order_randomly - Whether or not to randomly order the training instances.
    feature_selector - The command line code specifying what type of dimensionality reduction should be performed on the instances prior to training.
    classifier_type - The type of classifier to be trained.
    out - Status update and results of dimensionality reduction are sent here.
    max_attribute - If the number of attributes for a data set is larger than this number, an exhaustive search will be not be performed.
    verbose - Whether or not to print a detailed report of the dimensionality reduction that was performed.
    Returns:
    InstanceClassif object that can perform classifications with the trained Weka Classifier.
    Throws:
    java.lang.Exception - If a problem occurs.

  • classify

    public SegmentedClassification[] classify(java.lang.String results_file,
                                              TrainedModel trained)
                                       throws java.lang.Exception
    Classifies a set of instances with the given trained classifier. Saves the results in the given file in the form of an ACE XML Classifications file (if ACE XML files are used) or Weka ARFF file (if a Weka ARFF file is used). If using instances from a Weka ARFF file, instances will be converted into ACE datatypes prior to classification.

    Parameters:
    results_file - The name of the ACE XML classifications file or Weka ARFF file in which to store the results.
    trained - The TrainedModel in which the trained Weka Classifier is contained.
    Returns:
    Array of SegmentedClassification objects containing the classification for each Instance.
    Throws:
    java.lang.Exception - If specified results file has incorrect file extension or if classification was unsucessful.

    crossValidate

    public java.lang.String crossValidate(double max_class_membership_spread,
                                          double max_class_membership_count,
                                          boolean order_randomly,
                                          java.lang.String file_name,
                                          java.lang.String classifier_type,
                                          java.lang.String feature_selector,
                                          int folds,
                                          int max_attribute,
                                          java.io.OutputStream out,
                                          boolean verbose)
                                   throws java.lang.Exception
    Cross validates a set of Weka Instances. The results of the cross validation are stored in a separate CrossValidationResults Object.

    NOTE: for cross_validation_classifier_type, the codes for the types of WEKA classifiers are as follows:

  • Unweighted k-nn (k = 1): IBk
  • Naive Bayesian (Gaussian): NaiveBayes
  • Support Vector Machine: SMO
  • C4.5 Decision Tree: J48
  • Backdrop Neural Network: MultilayerPerceptron
  • AdaBoost seeded with C4.5 Decision Trees: AdaBoostM1
  • Bagging seeded with C4.5 Decision Trees: Bagging

    Parameters:
    max_class_membership_spread - The maximum ratio of instances that are permitted belonging to different classes. For example, a value of 2 means that only up to twice the number of instances belonging to the class with the smallest number of training instances may be used for training. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training and all others are eliminated. A value of 0 means that no maximum spread is enforced and a value of 1 enforces a unifrom distribution. Instances may be reorderd.
    max_class_membership_count - The maximum number of instances that may belong to any one class. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training, and all others are eliminated. A value of 0 means that no maximum is enforced.
    order_randomly - Whether or not to randomly order the training instances.
    file_name - The name of the file to which to save the results of this cross validation. Results are not saved if this is null.
    classifier_type - The code specifying the type of Weka Classifier to train.
    feature_selector - The command line code specifying the type of dimensionality reduction to be performed.
    folds - The number of cross-validation folds
    max_attribute - If the number of attributes for a data set is larger than this number, an exhaustive search will be not be performed.
    out - Status updates are sent here.
    verbose - Whether or not to add detailed information about the individual classifications to the results String.
    Returns:
    Summary of the cross validaiton results.
    Throws:
    java.lang.Exception - If unable to cross validate the given instances.

  • experiment

    public java.lang.String experiment(double max_class_membership_spread,
                                       double max_class_membership_count,
                                       boolean order_randomly,
                                       java.lang.String results_base_file_name,
                                       int folds,
                                       java.io.OutputStream out,
                                       boolean verbose,
                                       int max_attribute)
                                throws java.lang.Exception
    Experiments on a set of Weka Instances.

    Cross validates the Instances with a variety of different classifiers and dimensionality reduction in order to find the best classification approach.

    Parameters:
    max_class_membership_spread - The maximum ratio of instances that are permitted belonging to different classes. For example, a value of 2 means that only up to twice the number of instances belonging to the class with the smallest number of training instances may be used for training. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training and all others are eliminated. A value of 0 means that no maximum spread is enforced and a value of 1 enforces a unifrom distribution. Instances may be reorderd.
    max_class_membership_count - The maximum number of instances that may belong to any one class. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training, and all others are eliminated. A value of 0 means that no maximum is enforced.
    order_randomly - Whether or not to randomly order the training instances.
    results_base_file_name - The results of the experimentation will be stored in multiple files with this as the base file name.
    folds - The number of cross-validation folds to perform.
    out - The OutputStream to which status reports are printed.
    verbose - Whether or not to include detailed information about the dimensionality reduction of the best found classification approach and the individual classifications of the validation set.
    max_attribute - If the number of attributes for a data set is larger than this number, an exhaustive search will be not be performed.
    Returns:
    Summary of results for best found classifier.
    Throws:
    java.lang.Exception - If an error occurs during experimentation.

    loadInstances

    public void loadInstances(java.lang.String relation)
                       throws java.lang.Exception
    Loads the Instances from either ACE XML files or a Weka ARFF file. No restrictions or alterations are applied..

    Parameters:
    relation - String describing the purpose of these Instances. Will likely be "Training" or "Testing"
    Throws:
    java.lang.Exception - If an error occurs.

    prepareTrainingInstances

    public void prepareTrainingInstances(double max_class_membership_spread,
                                         double max_class_membership_count,
                                         boolean order_randomly)
                                  throws java.lang.Exception
    Prepares a set of Weka Instances for from either an arff file or ACE XML files. These Instances are prepared specifically for training. This method allows for instances to be altered, re-ordered, saved, or restricted prior to training.

    Note that instances are re-ordered within each class if either the max_class_membership_spread or max_class_membership_count parameters.

    Parameters:
    max_class_membership_spread - The maximum ratio of instances that are permitted belonging to different classes. For example, a value of 2 means that only up to twice the number of instances belonging to the class with the smallest number of training instances may be used for training. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training and all others are eliminated. A value of 0 means that no maximum spread is enforced and a value of 1 enforces a unifrom distribution. Instances may be reorderd.
    max_class_membership_count - The maximum number of instances that may belong to any one class. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training, and all others are eliminated. A value of 0 means that no maximum is enforced.
    order_randomly - Whether or not to randomly order the training instances.
    Throws:
    java.lang.Exception - If an error is encountered.

    performDimensionalityReduction

    public java.lang.String performDimensionalityReduction(TrainedModel trained,
                                                           java.lang.String feature_selector,
                                                           int max_attribute,
                                                           java.io.OutputStream out,
                                                           boolean verbose)
                                                    throws java.lang.Exception
    Performs dimensionality reduction on a set of Weka Instances. If the user wishes to use dimensionality reduction, this method will be called prior to testing and cross validation.

    Parameters:
    trained - The TrainedModel in which the Weka AttributeSelection object for this dimensionality reduction will be stored.
    feature_selector - Code specifying the type of dimensionality reduction to be performed.
    max_attribute - The maximum number of attributes permitted for exhaustive search.
    out - Output stream to which status reports and results will be printed.
    verbose -
    Returns:
    A String describing the dimensionality reduction that was performed.
    Throws:
    java.lang.Exception - If an error occurs.

    getAllUntrainedClassifiers

    public static weka.classifiers.Classifier[] getAllUntrainedClassifiers(java.util.LinkedList<java.lang.String> classifier_descriptions)
    Return an array of untrained but parameterized classifiers that can be used for a variety of purposes. The given LinkedList will be erased and filled with descriptions of each of the returned classifiers. This method is called during experimentation and all Classifiers are tested.

    Parameters:
    classifier_descriptions - This list will be filled with descriptions of the returned classifiers. Warning: any pre-existing contents will be erased.
    Returns:
    Classifiers that may be trained and evaluated.

    getOneUntrainedClassifier

    public static weka.classifiers.Classifier getOneUntrainedClassifier(java.lang.String classifier_type,
                                                                        java.lang.String[] description)
                                                                 throws java.lang.Exception
    Prepares a single Weka Classifier. The specified type of Classifier is created but not trained.

    Parameters:
    classifier_type - The type of classifier to be prepared.
    description - Will be of size 1 and will store a description of the Classifier being instantiated.
    Returns:
    An untrained weka classifier of the specified type.
    Throws:
    java.lang.Exception - If invalid Classifier type was specified.