naivebayes - Spark: How to get probabilities and AUC for Bernoulli Naive Bayes? -


i'm running bernoulli naive bayes using code:

val splits = mydata.randomsplit(array(0.75, 0.25), seed = 2l) val training = splits(0).cache() val test = splits(1) val model = naivebayes.train(training, lambda = 3.0, modeltype = "bernoulli") 

my question how can probability of membership class 0 (or 1) , count auc. want similar result logisticregressionwithsgd or svmwithsgd using code:

val numiterations = 100  val model = svmwithsgd.train(training, numiterations) model.clearthreshold()  // compute raw scores on test set. val labelandpreds = test.map { point =>       val prediction = model.predict(point.features)       (prediction, point.label) }  // evaluation metrics. val metrics = new binaryclassificationmetrics(labelandpreds) val auroc = metrics.areaunderroc()  

unfortunately code isn't working naivebayes.

concerning probabilities bernouilli naive bayes, here example :

// building dummy data val data = sc.parallelize(list("0,1 0 0", "1,0 1 0", "1,0 0 1", "0,1 0 1","1,1 1 0"))  // transforming dummy data labeledpoint val parseddata = data.map { line =>   val parts = line.split(',')   labeledpoint(parts(0).todouble, vectors.dense(parts(1).split(' ').map(_.todouble))) }  // prepare data training val splits = parseddata.randomsplit(array(0.75, 0.25), seed = 2l) val training = splits(0).cache() val test = splits(1) val model = naivebayes.train(training, lambda = 3.0, modeltype = "bernoulli")  // labels  val labels = model.labels // probabilities feature vectors val features = parseddata.map(lp => lp.features) model.predictprobabilities(features).take(10) foreach println  // 1 specific vector, i'm taking first vector in parseddata val testvector = parseddata.first.features println(s"for vector ${testvector} => probability : ${model.predictprobabilities(testvector)}") 

as auc :

// compute raw scores on test set. val labelandpreds = test.map { point =>   val prediction = model.predict(point.features)   (prediction, point.label) }  // evaluation metrics. val metrics = new binaryclassificationmetrics(labelandpreds) val auroc = metrics.areaunderroc() 

concerning inquiry chat :

val results = parseddata.map { lp =>   val probs: vector = model.predictprobabilities(lp.features)   (for (i <- 0 (probs.size - 1)) yield ((lp.label, labels(i), probs(i)))) }.flatmap(identity)  results.take(10).foreach(println)  // (0.0,0.0,0.59728640251696) // (0.0,1.0,0.40271359748304003) // (1.0,0.0,0.2546873180388961) // (1.0,1.0,0.745312681961104) // (1.0,0.0,0.47086939671877026) // (1.0,1.0,0.5291306032812298) // (0.0,0.0,0.6496075621805428) // (0.0,1.0,0.3503924378194571) // (1.0,0.0,0.4158585282373076) // (1.0,1.0,0.5841414717626924) 

and if interested in argmax classes :

val results = training.map { lp => val probs: vector = model.predictprobabilities(lp.features)   val bestclass = probs.argmax   (labels(bestclass), probs(bestclass)) } results.take(10) foreach println  // (0.0,0.59728640251696) // (1.0,0.745312681961104) // (1.0,0.5291306032812298) // (0.0,0.6496075621805428) // (1.0,0.5841414717626924) 

note: works spark 1.5+


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -