r - Conditional Factor Level Selection in Aggregation of Data Table -
i'm trying aggregate data.table
1 row per id.
suppose first column represents id , last column factor of interest:
mydt <- data.table(matrix(c(1,2,"level 1", 1,12,"level 0", 1,12,"level 0", 2,12,"level 3", 2,12,"level 2"), nrow = 5, ncol = 3, byrow = true)) mydt v1 v2 v3 1: 1 2 level 1 2: 1 12 level 0 3: 1 12 level 0 4: 2 12 level 3 5: 2 12 level 2
i have non-intuitive rules how aggregate factor:
- if
level 1
exists row of id aggregated row should havelevel 1
id - if not, if
level 2
exists id use it - if not,
level 3
if exists - if not,
level 0
the actual data.table
large , there no numeric component of actual factor levels, strings. script run @ least once per day, i'm trying avoid slow pre-processing loops.
the desired result this:
v1 v2 v3 1: 1 8.67 level 1 2: 2 12 level 2
however can't find suitable aggregation function...
mydt[,.(v2 = mean(v2, na.rm = t), v3 = if("level 1") "level 1" else if("idk me out?")), = "v1"]
we can convert 'v3' factor
levels
specified in specified order.
mydt[, v3:= factor(v3, levels=c('level 1', 'level 2', 'level 3', 'level 0'))][, list(v2= mean(as.numeric(v2)), v3= v3[which.min(v3)]) , v1] # v1 v2 v3 #1: 1 8.666667 level 1 #2: 2 12.000000 level 2
or option match
numeric index matching vector (arranged in specific order), index of minimum value, correpondng 'v3' value, grouped 'v1'. 'v2', mean
of 'v2' (the example showed in op's post had 'v2' column 'character' class - have wrap as.numeric
).
lvls <- paste('level', c(1:3, 0)) mydt[, list(v2= mean(as.numeric(v2)), v3= v3[which.min(match(v3, lvls))]) , v1]
Comments
Post a Comment