naivebayes - Spark: How to get probabilities and AUC for Bernoulli Naive Bayes? -
i'm running bernoulli naive bayes
using code:
val splits = mydata.randomsplit(array(0.75, 0.25), seed = 2l) val training = splits(0).cache() val test = splits(1) val model = naivebayes.train(training, lambda = 3.0, modeltype = "bernoulli")
my question how can probability of membership class 0 (or 1) , count auc. want similar result logisticregressionwithsgd
or svmwithsgd
using code:
val numiterations = 100 val model = svmwithsgd.train(training, numiterations) model.clearthreshold() // compute raw scores on test set. val labelandpreds = test.map { point => val prediction = model.predict(point.features) (prediction, point.label) } // evaluation metrics. val metrics = new binaryclassificationmetrics(labelandpreds) val auroc = metrics.areaunderroc()
unfortunately code isn't working naivebayes
.
concerning probabilities bernouilli naive bayes, here example :
// building dummy data val data = sc.parallelize(list("0,1 0 0", "1,0 1 0", "1,0 0 1", "0,1 0 1","1,1 1 0")) // transforming dummy data labeledpoint val parseddata = data.map { line => val parts = line.split(',') labeledpoint(parts(0).todouble, vectors.dense(parts(1).split(' ').map(_.todouble))) } // prepare data training val splits = parseddata.randomsplit(array(0.75, 0.25), seed = 2l) val training = splits(0).cache() val test = splits(1) val model = naivebayes.train(training, lambda = 3.0, modeltype = "bernoulli") // labels val labels = model.labels // probabilities feature vectors val features = parseddata.map(lp => lp.features) model.predictprobabilities(features).take(10) foreach println // 1 specific vector, i'm taking first vector in parseddata val testvector = parseddata.first.features println(s"for vector ${testvector} => probability : ${model.predictprobabilities(testvector)}")
as auc :
// compute raw scores on test set. val labelandpreds = test.map { point => val prediction = model.predict(point.features) (prediction, point.label) } // evaluation metrics. val metrics = new binaryclassificationmetrics(labelandpreds) val auroc = metrics.areaunderroc()
concerning inquiry chat :
val results = parseddata.map { lp => val probs: vector = model.predictprobabilities(lp.features) (for (i <- 0 (probs.size - 1)) yield ((lp.label, labels(i), probs(i)))) }.flatmap(identity) results.take(10).foreach(println) // (0.0,0.0,0.59728640251696) // (0.0,1.0,0.40271359748304003) // (1.0,0.0,0.2546873180388961) // (1.0,1.0,0.745312681961104) // (1.0,0.0,0.47086939671877026) // (1.0,1.0,0.5291306032812298) // (0.0,0.0,0.6496075621805428) // (0.0,1.0,0.3503924378194571) // (1.0,0.0,0.4158585282373076) // (1.0,1.0,0.5841414717626924)
and if interested in argmax classes :
val results = training.map { lp => val probs: vector = model.predictprobabilities(lp.features) val bestclass = probs.argmax (labels(bestclass), probs(bestclass)) } results.take(10) foreach println // (0.0,0.59728640251696) // (1.0,0.745312681961104) // (1.0,0.5291306032812298) // (0.0,0.6496075621805428) // (1.0,0.5841414717626924)
note: works spark 1.5+
Comments
Post a Comment