scala - Efficient grouping by key and StatCounter -


i aggregating values parameter below using apache-spark , scala. keeps adding values "list" there more efficient way list key , statcounter?

val predictorrawkey = predictorraw.map { x =>       val param = x._1       val val: double = x._2.todouble       (param, val)     }.mapvalues(num => list( num) )      .reducebykey((l1, l2) => l1 ::: l2)      .map { x => x._1, statcounter(x._2.iterator)) 

for starters shouldn't use reducebykey group values. more efficient omit map side aggregation , use groupbykey directly.

fortunately statcounter can work in streaming fashion , there no need group values @ all:

import org.apache.spark.util.statcounter  val pairs = predictorrawkey.map(x => (x._1, x._2.todouble))  val predictorrawkey = pairs.aggregatebykey(statcounter(nil))(   (acc: statcounter, x: double) => acc.merge(x),   (acc1: statcounter, acc2: statcounter) => acc1.merge(acc2) ) 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -