sparse matrix - Spark MLlib RowMatrix from SparseVector -
i trying create rowmatrix rdd of sparsevectors getting following error:
<console>:37: error: type mismatch; found : datarows.type (with underlying type org.apache.spark.rdd.rdd[org.apache.spark.mllib.linalg.sparsevector]) required: org.apache.spark.rdd.rdd[org.apache.spark.mllib.linalg.vector] note: org.apache.spark.mllib.linalg.sparsevector <: org.apache.spark.mllib.linalg.vector (and datarows.type <: org.apache.spark.rdd.rdd[org.apache.spark.mllib.linalg.sparsevector]), class rdd invariant in type t. may wish define t +t instead. (sls 4.5) val svd = new rowmatrix(datarows.persist()).computesvd(20, computeu = true)
my code is:
import org.apache.spark.mllib.linalg.distributed.rowmatrix import org.apache.spark.mllib.linalg._ import org.apache.spark.{sparkconf, sparkcontext} val data_file_dir = "/user/cloudera/data/" val data_file_name = "dataoct.txt" val datarows = sc.textfile(data_file_dir.concat(data_file_name)).map(line => vectors.dense(line.split(" ").map(_.todouble)).tosparse) val svd = new rowmatrix(datarows.persist()).computesvd(20, computeu = true)
my input data file approximately 150 rows 50,000 columns of space separated integers.
i running:
spark: version 1.5.0-cdh5.5.1 java: 1.7.0_67
just provide explicit type annotation either rdd
val datarows: org.apache.spark.rdd.rdd[vector] = ???
or result of anonymous function:
... .map(line => vectors.dense(line.split(" ").map(_.todouble)).tosparse: vector)
Comments
Post a Comment