Integrating WEKA Classifiers

Create a classifier using the Weka Explorer

  1. Create a CSV file with your training data. Make sure the file has headers. For nominal attributes, I recommend using string names -- otherwise Weka will interpret them as continuous numeric values by default.
  2. Run the Weka Explorer GUI.
  3. On the Preprocess tab, load your CSV file. You may want to save it as an ARFF file for future use.
  4. On the Classify tab, choose and train a classifier.
  5. Right-click on the new entry in the Result list (lower left corner). Choose Save model to export a serialized copy of the classifier.

Integrate the classifier into your Java code

We'll be following the instructions at http://weka.wikispaces.com/Programmatic+Use and http://weka.wikispaces.com/Serialization#Deserializing

Follow Step 1 in Programmatic Use.
Declare Attribute objects for all the attributes in your data relation.
For each nominal attribute, you'll need to create a FastVector and add to it strings representing all the possible values of the attribute. Use this FastVector when initializing the nominal attribute.
Create a FastVector representing all the Attributes, in order.

Follow the beginning of Step 2.
Create an Instances object do define the relation, and set the index of the output attribute using setClassIndex.

Don't add any actual Instance objects to the Instances object.

Declare a Classifier. Instead of training the Classifier at runtime, initialize it by loading and deserializing the model file that you saved. See http://weka.wikispaces.com/Serialization#Deserializing

Classify an instance

Create a new Instance object.

Using the correct order, call setValue, passing Attribute objects and numeric/string values to populate the Instance. (See the middle of Step 2 in Programmatic Use.) For a more general way to do this, see the method getInstance in Konstantin's WekaWrapper class.

Set the Instance object's relation by passing the Instances object to setDataset.

Calculate an array of doubles representing the probability distribution by calling the Classifier 's distributionForInstance method, passing in the Instance object. (See Step 4 of Programmatic Use.)