Darren DeRidder / @73rhodes
machine learning
naive bayesian classifiers
node.js
@73rhodes • github/73rhodes • 51elliot.blogspot.com
Computer Systems Engineer
Real-time • AAA • Network Security • Mobile
Tech lead on Kindsight Mobile Security @ Alcatel
Mobile World Congress • Blackhat 2013
@ottawa_js organizer
"I Am Not A Data Scientist"
(IANADS)
and that's ok!
There are lots of tools available for us mortals.
simple, yet surprisingly effective
` P(A|B) = (P(B|A)P(A)) / (P(B)) = ...`
`= (P(B|A)P(A)) / ( P(B|A) P(A) + (1-P(B|A))(1-P(A)))`
`P(A) = ( prod_(i=1)^n P(A|W_i) ) / ( (prod_(i=1)^n P(A|W_i)) + (prod_(i=1)^n (1 - P(A|W_i))) )`
Or, in Plain English
a box of chocolates.
You never know what you're gonna get.
(But you can make a pretty good guess!)
Nuts | No Nuts | |
Round | 25% | 75% |
Square | 75% | 25% |
Dark | 10% | 90% |
Light | 90% | 10% |
What if we pick a round, light chocolate?
A round, light chocolate...
Nuts | No Nuts | P(Nuts) | P(NoNuts) | |||
Round | .25 | .75 | .25 | .75 | ||
Square | .75 | .25 | - | - | ||
Dark | .10 | .90 | - | - | ||
Light | .90 | .10 | .90 | .10 | ||
`prod_(i=1)^n P_i` | .225 | .075 |
`x = 0.225 / 0.075 = 3`
A round, light chocolate is 3 times more likely to have nuts.
(This is a likelihood function.)
Classify as "Nuts" or "No Nuts", with some level of certainty.
`P(N) = 0.225 / (0.225 + 0.075) = 0.75 = 75%`
(We're 75% sure this chocolate has nuts.)
Optimized binary classifier for limited vocabularies.
Leverages "missing" traits to improve accuracy by ~10%.
Used in production...
const item1 = new Document(['awful','basic','cautious']);
const item2 = new Document(['awful','basic','cautious']);
const item3 = new Document(['awful','delightful','energetic']);
const item4 = new Document(['cautious', 'delightful']);
const item5 = new Document(['energetic']);
const item6 = new Document(['basic','delightful','energetic']);
const data = new DataSet();
data.add('bad', [item1, item2, item3]);
data.add('good', [item4, item5, item6]);
const classifier = new Classifier(options);
classifier.train(data);
const testDoc = new Document('testDoc', ['b','c', 'e']);
const result1 = classifier.classify(testDoc);
console.log(result1);