|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectace.datatypes.DataBoard
public class DataBoard
Stores the data needed for training, testing and using classifiers. Stores a taxonomy, feature definitions, feature vectors of instances and model classifications of instances.
The contents of objects of this class can be loaded from ACE XML files or a Weka ARFF file using one of the constructors. Methods are also implemented for saving and loading objects of this class directly as serializable objects. The contents of an object of this class may also be separated and saved as individual XML files.
A method is also available for generating a Weka ARFF file from an object of this class. This method also generates an array of strings identifying the source of each line in the resulting ARFF file.
Field Summary | |
---|---|
FeatureDefinition[] |
feature_definitions
Holds meta-data about the feautres that characterize instances. |
DataSet[] |
feature_vectors
Feature vectors for a set of instances. |
SegmentedClassification[] |
model_classifications
The model classifications that are used in supervised training. |
Taxonomy |
taxonomy
The taxonomy that instances are classified into. |
Constructor Summary | |
---|---|
DataBoard()
Generates an empty DataBoard. |
|
DataBoard(java.lang.String arff_file)
Generates the ACE datatypes from a Weka ARFF file. |
|
DataBoard(java.lang.String taxonomy_file,
java.lang.String feature_key_file,
java.lang.String[] feature_vector_files,
java.lang.String classifications_file)
Generates a DataBoard based on the contents of the given XML files. |
|
DataBoard(Taxonomy taxonomy,
FeatureDefinition[] feature_definitions,
DataSet[] feature_vectors,
SegmentedClassification[] model_classifications)
Generates a DataBoard with the fields specified in the parameters. |
Method Summary | |
---|---|
SegmentedClassification[] |
getClassifiedResults(weka.core.Instances instances,
boolean save_intermediate_arffs,
TrainedModel trained,
boolean use_top_level_features,
boolean use_sub_section_features)
Classify the given set of Instances using the given AttributeSelection and the given Classifier. |
FeatureDefinition[] |
getFeatureDefinitions()
Returns meta-data about the feautres that characterize instances. |
int[] |
getFeatureDimensionalities()
Returns the number of dimensions of each of the features stored in the feature_definitions field. |
java.lang.String[] |
getFeatureNames()
Returns the names of the features stored in the feature_definitions field. |
DataSet[] |
getFeatureVectors()
Returns feature vectors for a set of instances. |
weka.core.Instances |
getInstanceAttributes(java.lang.String data_set_name,
int initial_capacity)
Uses the feature definitions and taxonomy stored in this DataBoard to return an empty set of Weka Instances. |
java.lang.String[] |
getInstanceIdentifiers()
Gets array of unique identifiers for DataSet object of this DataBoard. |
java.lang.String[] |
getInstanceMetaDataFields()
Returns the names of all meta-data fields stored in the contents of any of the instances stored in the model_classifications field. |
SegmentedClassification |
getMatchingModelClassification(DataSet data_set)
Searches the model_classifications stored in this DataBoard with an identifier that matches the identifier of the given DataSet. |
SegmentedClassification[] |
getModelClassifications()
Returns the model classifications that are used in supervised training. |
Taxonomy |
getTaxonomy()
Returns the taxonomy that instances are to be classified into. |
boolean |
hasSections()
|
static DataBoard |
loadDataBoard(java.io.File databoard_file)
Load the specified DataBoard serialized object file and return its contents. |
static void |
saveDataBoard(DataBoard to_save,
java.io.File databoard_file)
Save the contents of this DataBoard to a File. |
static void |
saveInstancesAsARFF(weka.core.Instances instances,
java.lang.String file_path)
Save the given Weka Instances as an arff file with the given path. |
java.lang.String[] |
saveToARFF(java.lang.String relation_name,
java.io.File databoard_file,
boolean use_top_level_features,
boolean use_sub_section_features)
Produces a Weka ARFF file based on the contents of this object. |
void |
saveXMLFiles(java.io.File taxonomy_file,
java.io.File feature_key_file,
java.io.File feature_vector_file,
java.io.File classifications_file)
Saves the stored taxonomy, feature definitions, feature vectors and/or model classifications stored in this DataBoard to individual XML files of the respectively appropriate type. |
void |
storeInstances(weka.core.Instances set_of_instances,
boolean use_top_level_features,
boolean use_sub_section_features)
Extracts the feature values and model classifications stored in this DataBoard object and stores them in the given set of Weka Instances. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public Taxonomy taxonomy
May be null if clustering algorithms are to be used or if the taxonomy is to be derived from the model_classifications field.
public FeatureDefinition[] feature_definitions
May be null if the feature_vectors have sufficient self-contained infromation, although this is not recommended.
public DataSet[] feature_vectors
In general, these should be taken in conjunction with feature_definitions in order to minimize storage space and processing overhead.
public SegmentedClassification[] model_classifications
Class names should correspond with those in the taxonomy field. Instances should correspond to those in the feature_vectors field.
May be null if clustering algorithms are to be used of if this DataBoard is being used to classify novel patterns with already trained classifiers.
Constructor Detail |
---|
public DataBoard()
public DataBoard(Taxonomy taxonomy, FeatureDefinition[] feature_definitions, DataSet[] feature_vectors, SegmentedClassification[] model_classifications) throws java.lang.Exception
taxonomy
- The taxonomy to classify instances into.feature_definitions
- Descriptiosn of features to characterize
features with.feature_vectors
- The feature vectors characterizing
instances.model_classifications
- Model classifications for use in
supervised training.
java.lang.Exception
- An informative exception is thrown
if any of the data in the providedfields
are incompatible with one another.public DataBoard(java.lang.String taxonomy_file, java.lang.String feature_key_file, java.lang.String[] feature_vector_files, java.lang.String classifications_file) throws java.lang.Exception
classifications_file
- The path of a classifications_file XML
file holding a taxonomy. May be null
if clustering is to be used to derive
a new taxonomy or if a provided set of
model classificatios will be used to
construct a taxonomy. An entry of "" is
considered equivalent to null.feature_key_file
- The path of a feature_key_file XML
file holding feature descriptions. May
be null if the provided feature vectors
have enough self-contained information,
but this is not recommended. An entry of
"" is considered equivalent to null.feature_vector_files
- An array of file paths referring to
feature_vector_files holding eature
vectors for a set of instances. If a
feature_key_file was provided, the
feature_vector files are ordered and
compacted based on it. An entrys of "" i
considered equivalent to null.taxonomy_file
- The path of a taxonomy_file XML file
holding the model classifications that
are used in supervised training using
the given feature_vector_files. May be
null if clustering algorithms are to be
used of if this DataBoard is being used
to classify novel patterns with already
trained classifiers. An entry of "" is
considered equivalent to null.
java.lang.Exception
- An informative exception is thrown
if any of the file paths provided are
invalid or if the data contained in the
files is incompatible with one another.public DataBoard(java.lang.String arff_file) throws java.lang.Exception
arff_file
- The Weka ARFF file containing the Instances to be stored in this DataBoard.
java.lang.Exception
Method Detail |
---|
public Taxonomy getTaxonomy()
public FeatureDefinition[] getFeatureDefinitions()
public DataSet[] getFeatureVectors()
In general, these should be taken in conjunction with FeatureDefinitions in order to minimize storage space and processing overhead.
public SegmentedClassification[] getModelClassifications()
Class names should correspond with those in the Taxonomy. Instances should correspond to those in the DataSet feature vectors.
Will return null if clustering algorithms are to be used of if this DataBoard is being used to classify novel patterns with already trained classifiers.
public java.lang.String[] getFeatureNames()
public int[] getFeatureDimensionalities()
public java.lang.String[] getInstanceMetaDataFields()
public SegmentedClassification getMatchingModelClassification(DataSet data_set)
data_set
- The DataSet to attempt to find a matching model
classification for.
public weka.core.Instances getInstanceAttributes(java.lang.String data_set_name, int initial_capacity) throws java.lang.Exception
The returned set includes all feature names, including numbered feature names for multi-dimensional features, as well as class names. Class names are put in the last Attribute. Only leaf class names are used.
Note that Attribute information may not be changed after this method is called.
data_set_name
- The name to assign to the relation.initial_capacity
- The initial capacity of the set.
java.lang.Exception
- An informative exception is thrown if
insufficient information is available
to construct the Attributes.public void storeInstances(weka.core.Instances set_of_instances, boolean use_top_level_features, boolean use_sub_section_features) throws java.lang.Exception
Both pre-classified and unclassified data may be dealt with. Both overal data sets and data sets involving sub-sections may be dealt with.
If the model_classifications field is null, no model classes are saved. If the taxonomy field is null, then the class names are extracted from the model_classifications field if it is not null.
IMPORTANT: Since ARFF files cannot accomodate multiple classes per instance, the feature vector for an instance with multiple classes is repeated twice, once for each class.
set_of_instances
- The Weka Instances object to store
individual instances in.use_top_level_features
- Whether or not to store overall
classifications for individual
instances.use_sub_section_features
- Whether or not to store the sub-
sections of instances.
java.lang.Exception
- An exception is thrown if no
feature definitions or no
feature vectors are available. An
exception is also thrown if both
of the boolean parameters are false.public SegmentedClassification[] getClassifiedResults(weka.core.Instances instances, boolean save_intermediate_arffs, TrainedModel trained, boolean use_top_level_features, boolean use_sub_section_features) throws java.lang.Exception
No reference is mad to any model classifications.
IMPORTANT: The order of the instances must not have been changed from the time that they were constructed by a call to the storeInstances method. If they have, or if the attribute_selector reorders instances, then this method will not work properly.
IMPORTANT: The use_top_level_features and use_sub_section_features parameters must be the same as when the instances were constructed with the storeInstances method.
instances
- The Weka Instances object to that
individual instances tob be classified
are stored in. In general, should
have been generated with the
storeInstances method.save_intermediate_arffs
- Whether or not to save testing data to
an arff file after after feature
selection, if any. Useful for testing.trained
- Serializable object containing reference
the Weka objects needed for classification
(Classifier, AttributeSelection, Attribute (class attribute))use_top_level_features
- Whether or not to store overall
classifications for individual instances.use_sub_section_features
- Whether or not to store the sub-
sections of instances.
java.lang.Exception
- An exception occurs if Weka encounters
a problem.public java.lang.String[] saveToARFF(java.lang.String relation_name, java.io.File databoard_file, boolean use_top_level_features, boolean use_sub_section_features) throws java.lang.Exception
If the model_classifications field is null, no model classes are saved. If the taxonomy field is null, then the class names are extracted from the model_classifications field if it is not null.
An array of strings is returned. There is one entry for each data line saved to the ARFF file, with the entry identifying the data set and (if appropriate) the section that each ARFF data line corresponds to.
IMPORTANT: Since ARFF files cannot accomodate multiple classes per instance, the feature vector for an instance with multiple classes is repeated twice, once for each class.
IMPORTANT: All class names and feature names have blank spaces replaced by underscores in the ARFF file.
relation_name
- The name of the relation that is
being saved to the ARFF file.databoard_file
- The ARFF file to be saved into.use_top_level_features
- Whether or not to save overall
classifications for individual
instances.use_sub_section_features
- Whether or not to save the sub-
sections of instances.
java.lang.Exception
- An exception is thrown if no
feature definitions or no
feature vectors are provided. An
exception is also thrown if both
of the boolean parameters are
false.public void saveXMLFiles(java.io.File taxonomy_file, java.io.File feature_key_file, java.io.File feature_vector_file, java.io.File classifications_file) throws java.lang.Exception
taxonomy_file
- The file to save the taxonomy to. Null
if the taxonomy is not to be saved.feature_key_file
- The file to save the feature defintions
to. Null if the definitions are not to
be saved.feature_vector_file
- The file to save the feature vectors to.
Null if the vectors are not to be saved.classifications_file
- The file to save the model
classifications to. Null if the
classificaitons not to be saved.
java.lang.Exception
- An informative exception is thrown if
a request is made to save a file type
whose corresponding field is empty.public static void saveDataBoard(DataBoard to_save, java.io.File databoard_file) throws java.lang.Exception
databoard_file
- The File to save to.to_save
- The DataBoard to save.
java.lang.Exception
- Throws an exception if an error occurs during
saving.public static DataBoard loadDataBoard(java.io.File databoard_file) throws java.lang.Exception
databoard_file
- The File to load.
java.lang.Exception
- Throws an exception if an error occurs during
loading.public static void saveInstancesAsARFF(weka.core.Instances instances, java.lang.String file_path) throws java.lang.Exception
instances
- The weka instances to save.file_path
- The path of the arff file to save.
java.lang.Exception
- Throws an exception if cannot save the given
instances.public java.lang.String[] getInstanceIdentifiers()
public boolean hasSections()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |