r - Conditional Factor Level Selection in Aggregation of Data Table -
i'm trying aggregate data.table 1 row per id.
suppose first column represents id , last column factor of interest:
mydt <- data.table(matrix(c(1,2,"level 1", 1,12,"level 0", 1,12,"level 0", 2,12,"level 3", 2,12,"level 2"), nrow = 5, ncol = 3, byrow = true)) mydt    v1 v2      v3 1:  1  2 level 1 2:  1 12 level 0 3:  1 12 level 0 4:  2 12 level 3 5:  2 12 level 2 i have non-intuitive rules how aggregate factor:
- if level 1exists row of id aggregated row should havelevel 1id
- if not, if level 2exists id use it
- if not, level 3if exists
- if not, level 0
the actual data.table large , there no numeric component of actual factor levels, strings. script run @ least once per day, i'm trying avoid slow pre-processing loops. 
the desired result this:
   v1 v2      v3 1:  1  8.67 level 1 2:  2 12 level 2 however can't find suitable aggregation function...
mydt[,.(v2 = mean(v2, na.rm = t), v3 = if("level 1") "level 1" else if("idk me out?")), = "v1"] 
we can convert 'v3'  factor levels specified in specified order.
mydt[, v3:= factor(v3, levels=c('level 1', 'level 2', 'level 3',  'level 0'))][, list(v2= mean(as.numeric(v2)),                      v3= v3[which.min(v3)]) , v1] #   v1        v2      v3 #1:  1  8.666667 level 1 #2:  2 12.000000 level 2 or option match numeric index matching vector (arranged in specific order), index of minimum value, correpondng 'v3' value, grouped 'v1'.  'v2', mean of 'v2' (the example showed in op's post had 'v2' column 'character' class - have wrap as.numeric).
lvls <- paste('level', c(1:3, 0)) mydt[, list(v2= mean(as.numeric(v2)),               v3= v3[which.min(match(v3, lvls))]) , v1] 
Comments
Post a Comment