Integrating WEKA Classifiers
Create a classifier using the Weka Explorer
- Create a CSV file with your training data. Make sure the file has headers. For nominal attributes, I recommend using string names -- otherwise Weka will interpret them as continuous numeric values by default.
- Run the Weka Explorer GUI.
- On the Preprocess tab, load your CSV file. You may want to save it as an ARFF file for future use.
- On the Classify tab, choose and train a classifier.
- Right-click on the new entry in the Result list (lower left corner). Choose Save model to export a serialized copy of the classifier.
Integrate the classifier into your Java code
We'll be following the instructions at http://weka.wikispaces.com/Programmatic+Use and http://weka.wikispaces.com/Serialization#Deserializing
Follow Step 1 in Programmatic Use.
Declare Attribute
objects for all the attributes in your data relation.
For each nominal attribute, you'll need to create a FastVector and add to it strings representing all the possible values of the attribute. Use this FastVector when initializing the nominal attribute.
Create a FastVector representing all the Attributes, in order.
Follow the beginning of Step 2.
Create an Instances
object do define the relation, and set the index of the output attribute using setClassIndex.
Don't add any actual Instance
objects to the Instances
object.
Declare a Classifier
. Instead of training the Classifier
at runtime, initialize it by loading and deserializing the model file that you saved. See http://weka.wikispaces.com/Serialization#Deserializing
Classify an instance
Create a new Instance
object.
Using the correct order, call setValue
, passing Attribute
objects and numeric/string values to populate the Instance
. (See the middle of Step 2 in Programmatic Use.) For a more general way to do this, see the method getInstance
in Konstantin's WekaWrapper class.
Set the Instance
object's relation by passing the Instances
object to setDataset
.
Calculate an array of doubles representing the probability distribution by calling the Classifier
's distributionForInstance
method, passing in the Instance
object. (See Step 4 of Programmatic Use.)