r - sorting columns of Rcpp NumericMatrix for median calculations -


i've been testing rcpp , rcpparmadillo calculating summary stats on big matrices. lot faster (5 or 10 times faster) base r colmeans or the armadillo on ~4million rows, 45 columns.

colmeansrcpp <- cxxfunction(signature(x_="integer"),                              plugin='rcpp',                             body='                             rcpp::integermatrix x = x_;                             int ncol = x.ncol(); int nrow = x.nrow();                                                   rcpp::numericvector out(ncol);                             for(int col = 0; col < ncol; col++){                               out[col]=rcpp::sum(x(_, col));                             }                                                          return wrap(out/nrow);                           ') 

i want calculate median , maybe other quantiles plotting - , because requires sort more needy of c++ outsourcing. armadillo seems bit slow wanted in place sort on code similar above cant syntax right... here i'm trying..

# ok i'm aware floor(nrow/2) not **absolutely** correct  # i'm simplifying here     colmedianrcpp <- cxxfunction(signature(x_="integer"),                            plugin='rcpp',                           body='                           rcpp::integermatrix x = clone(x_);                           int ncol = x.ncol(); int nrow = x.nrow();                                                      rcpp::numericvector out(ncol);                           for(int col = 0; col < ncol; col++){                           x(_,col)= std::sort((x_,col).begin, (x_,col).end));                           out[col]=x(floor(nrow/2), col));                           }                         return wrap(out);                         ') 

basically it's line

x(_,col)= std::sort((x_,col).begin, (x_,col).end)); 

i don't know how express "sort column in place" mixture of rcpp sugar , std c++. sorry can see i'm doing wrong hint on right syntax lovely.

ps right need clone() don't change r object?

edit add rcpparmadillo code , benchmark comparison address answer/comment below. benchmark on 50k rows quick reply recall similar many more. realise rcpp author.. many time!

the thought occurs perhaps i'm doing daft rcpparmadillo code make run slower base colmeans or rcpp version?

colmeansrcpparmadillo <- cxxfunction(signature(x_="integer"),                                       plugin="rcpparmadillo",                                       body='                                       arma::mat x = rcpp::as<arma::mat > (x_);                                       arma::rowvec md= arma::mean(x, 0);                                       return wrap(md);                                     ') 

and benchmark ...

(mb = microbenchmark( +                     colmeans(fqsmallmatrix),  +                     colmeansrcpp(fqsmallmatrix),  +                     colmeansrcpparmadillo(fqsmallmatrix), +                     times=50)) unit: milliseconds                                  expr       min       lq    median        uq        max neval               colmeans(fqsmallmatrix) 10.620919 10.63289 10.640819 10.648882  10.907145    50           colmeansrcpp(fqsmallmatrix)  2.649038  2.66832  2.676709  2.700839   2.841012    50  colmeansrcpparmadillo(fqsmallmatrix) 25.687067 26.23488 33.168589 33.792489 113.832495    50 

you not showing rcpparmadillo code -- have been quite happy performance of rcpparmadillo code needed row/col column subsetting.

you can instantiate armadillo matrices via rcpp efficiently (no copy, re-using r object memory) try that.

and you: want clone() distinct copy, , think you'd free if use default rcpparmadillo ctor (rather more efficient two-step).

edit few hours later

you had left open question why armadillo slow. in meantime, vincent solved issue here revisited, cleaner solution using code vincent's.

now how instantiates armadillo matrix without copy -- faster. , avoids mixing integer , numeric matrices. code first:

#include <rcpparmadillo.h>   using namespace rcpp;  // [[rcpp::depends(rcpparmadillo)]]  // [[rcpp::export]] numericvector colmedianrcpp(numericmatrix x) {     int nrow = x.nrow();     int ncol = x.ncol();     int position = nrow / 2; // euclidian division     numericvector out(ncol);     (int j = 0; j < ncol; j++) {          numericvector y = x(_,j); // copy column -- original not mod         std::nth_element(y.begin(), y.begin() + position, y.end());          out[j] = y[position];       }     return out; }  // [[rcpp::export]] arma::rowvec colmeansrcpparmadillo(numericmatrix x){     arma::mat x = arma::mat(x.begin(), x.nrow(), x.ncol(), false);      return arma::mean(x, 0);  }  // [[rcpp::export]] numericvector colmeansrcpp(numericmatrix x) {     int ncol = x.ncol();     int nrow = x.nrow();      rcpp::numericvector out(ncol);     (int col = 0; col < ncol; col++){         out[col]=rcpp::sum(x(_, col));      }      return wrap(out/nrow); }   /*** r set.seed(42) x <- matrix(rnorm(100*10), 100, 10) library(microbenchmark)  mb <- microbenchmark(colmeans(x), colmeansrcpp(x), colmeansrcpparmadillo(x),                      colmedianrcpp(x), times=50)   print(mb) */ 

and here result on machine, concise armadillo version fast yours, , median little slower has more work:

r> sourcecpp("/tmp/stephen.cpp")  r> set.seed(42) r> x <- matrix(rnorm(100*10), 100, 10) r> library(microbenchmark) r> mb <- microbenchmark(colmeans(x), colmeansrcpp(x), colmeansrcpparmadillo(x), +                      colmedianrcpp(x), times=50)  r> print(mb) unit: microseconds                      expr    min     lq  median     uq    max neval               colmeans(x)  9.469 10.422 11.5810 12.421 30.597    50            colmeansrcpp(x)  3.922  4.281  4.5245  5.306 18.020    50   colmeansrcpparmadillo(x)  4.196  4.549  4.9295  5.927 11.159    50           colmedianrcpp(x) 15.615 16.291 16.7290 17.971 27.026    50  r> 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -