I’ve adapted my spam filtering example from programming from a to z as a more generic text classification Processing library. Basically, you can create a Classifier object:
Classifier filter = new Classifier();
Train it with text for category “A” and category “B”:
String[] shakespeare = loadStrings("hamlet.txt");
filter.trainA(join(shakespeare," "));
String[] chekov = loadStrings("vanya.txt");
filter.trainB(join(chekov," "));
And then evaluate “unknown” text:
String toAnalzye = "To be or not to be. That is the question."; probA = 100 * filter.analyze(toAnalzye); println(toAnalzye + " is " + probA + " % likely to be Shakespeare");
Download library + source: classifier.zip
Download Processing example code: bayes.zip
There’s a lot more that can be done here (I have working code that doesn’t restrict the classification to a binary choice) so further updates will come eventually! Also, need to allow the library user access to the underlying hashtable of words and their counts / relative probabilities.