7 in Seven

7 in Seven is an audacious scheme by the ITP Resident Researchers to do seven creative projects in seven days

www.gabebc.com
www.katehartman.com
www.jennylc.com
www.jeffreyleblanc.org
www.faludi.com
www.shiffman.net
http://itp.nyu.edu/~dbo3/
www.tigoe.net

Follow us on MAKE:
Announcement
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7

shiffman
I’ve adapted my spam filtering example from programming from a to z as a more generic text classification Processing library.     Basically, you can create a Classifier object:
Classifier filter = new Classifier();

Train it with text for category “A” and category “B”:
String[] shakespeare = loadStrings("hamlet.txt");
filter.trainA(join(shakespeare," "));String[] chekov = loadStrings("vanya.txt");
filter.trainB(join(chekov," "));

And then evaluate “unknown” text:
String toAnalzye = "To be or not to be. That is the question.";
probA = 100 * filter.analyze(toAnalzye);
println(toAnalzye +  " is " + probA + " % likely to be Shakespeare");

Download library + source: classifier.zip
Download Processing example code: bayes.zip
There’s a lot more that can be done here (I have working code that doesn’t restrict the classification to a binary choice) so further updates will come eventually!  Also, need to allow the library user access to the underlying hashtable of words and their counts / relative probabilities.

I’ve adapted my spam filtering example from programming from a to z as a more generic text classification Processing library. Basically, you can create a Classifier object:

Classifier filter = new Classifier();

Train it with text for category “A” and category “B”:

String[] shakespeare = loadStrings("hamlet.txt");
filter.trainA(join(shakespeare," "));

String[] chekov = loadStrings("vanya.txt"); filter.trainB(join(chekov," "));

And then evaluate “unknown” text:

String toAnalzye = "To be or not to be. That is the question.";
probA = 100 * filter.analyze(toAnalzye);
println(toAnalzye +  " is " + probA + " % likely to be Shakespeare");

Download library + source: classifier.zip

Download Processing example code: bayes.zip

There’s a lot more that can be done here (I have working code that doesn’t restrict the classification to a binary choice) so further updates will come eventually! Also, need to allow the library user access to the underlying hashtable of words and their counts / relative probabilities.