r - Conditional Factor Level Selection in Aggregation of Data Table -


i'm trying aggregate data.table 1 row per id.

suppose first column represents id , last column factor of interest:

mydt <- data.table(matrix(c(1,2,"level 1", 1,12,"level 0", 1,12,"level 0", 2,12,"level 3", 2,12,"level 2"), nrow = 5, ncol = 3, byrow = true)) mydt    v1 v2      v3 1:  1  2 level 1 2:  1 12 level 0 3:  1 12 level 0 4:  2 12 level 3 5:  2 12 level 2 

i have non-intuitive rules how aggregate factor:

  • if level 1 exists row of id aggregated row should have level 1 id
  • if not, if level 2 exists id use it
  • if not, level 3 if exists
  • if not, level 0

the actual data.table large , there no numeric component of actual factor levels, strings. script run @ least once per day, i'm trying avoid slow pre-processing loops.

the desired result this:

   v1 v2      v3 1:  1  8.67 level 1 2:  2 12 level 2 

however can't find suitable aggregation function...

mydt[,.(v2 = mean(v2, na.rm = t), v3 = if("level 1") "level 1" else if("idk me out?")), = "v1"] 

we can convert 'v3' factor levels specified in specified order.

mydt[, v3:= factor(v3, levels=c('level 1', 'level 2', 'level 3',  'level 0'))][, list(v2= mean(as.numeric(v2)),                      v3= v3[which.min(v3)]) , v1] #   v1        v2      v3 #1:  1  8.666667 level 1 #2:  2 12.000000 level 2 

or option match numeric index matching vector (arranged in specific order), index of minimum value, correpondng 'v3' value, grouped 'v1'. 'v2', mean of 'v2' (the example showed in op's post had 'v2' column 'character' class - have wrap as.numeric).

lvls <- paste('level', c(1:3, 0)) mydt[, list(v2= mean(as.numeric(v2)),               v3= v3[which.min(match(v3, lvls))]) , v1] 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -