R setdiff() by regex -
is possible customize setdiff
using regular expressions see in 1 vector , not another? example:
x <- c("1\t119\t120\t1\t119\t120\tabc\tdef\t0", "2\t558\t559\t2\t558\t559\tghi\tjkl\t0", "3\t139\t141\t3\t139\t141\tmno\tpqr\t0", "3\t139\t143\t3\t139\t143\tstu\tvwx\t0") [1] "1\t119\t120\t1\t119\t120\tabc\tdef\t0" [2] "2\t558\t559\t2\t558\t559\tghi\tjkl\t0" [3] "3\t139\t141\t3\t139\t141\tmno\tpqr\t0" [4] "3\t139\t143\t3\t139\t143\tstu\tvwx\t0" y <- c("1\t119\t120\t1\t109\t120\tabc\tdef\t0", "2\t558\t559\t2\t548\t559\tghi\tjkl\t0", "3\t139\t141\t3\t129\t141\tmno\tpqr\t0", "3\t139\t143\t3\t129\t143\tstu\tvwx\t0", "4\t157\t158\t4\t147\t158\txwx\tyty\t0", "5\t158\t159\t5\t148\t159\tphp\twzw\t0") [1] "1\t119\t120\t1\t109\t120\tabc\tdef\t0" [2] "2\t558\t559\t2\t548\t559\tghi\tjkl\t0" [3] "3\t139\t141\t3\t129\t141\tmno\tpqr\t0" [4] "3\t139\t143\t3\t129\t143\tstu\tvwx\t0" [5] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" [6] "5\t158\t159\t5\t148\t159\tphp\twzw\t0"
i want able show that:
[5] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" [6] "5\t158\t159\t5\t148\t159\tphp\twzw\t0"
are new because 4\t157\t158
, 4\t157\t158
unique y
. doesn't work:
> setdiff(y,x) [1] "1\t119\t120\t1\t109\t120\tabc\tdef\t0" "2\t558\t559\t2\t548\t559\tghi\tjkl\t0" [3] "3\t139\t141\t3\t129\t141\tmno\tpqr\t0" "3\t139\t143\t3\t129\t143\tstu\tvwx\t0" [5] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" "5\t158\t159\t5\t148\t159\tphp\twzw\t0"
because column 5 different in both x
, y
. want setdiff
based on first 3 columns.
a simple example of setdiff
can found here: how tell in 1 vector , not another?
one way put x
, y
data.frame
s , anti-join. i'll use data.table
since find more natural.
library(data.table) xdt <- as.data.table(do.call("rbind", strsplit(x, split = "\t"))) ydt <- as.data.table(do.call("rbind", strsplit(y, split = "\t")))
now anti-join (a "setdiff
" data.frame
s/data.table
s):
ydt[!xdt, on = paste0("v", 1:3)] # v1 v2 v3 v4 v5 v6 v7 v8 v9 # 1: 4 157 158 4 147 158 xwx yty 0 # 2: 5 158 159 5 148 159 php wzw 0
you row index (thanks @frank suggested improvement/simplification):
> ydt[!xdt, = true, on = paste0("v", 1:3)]
or extract directly y
:
> y[ydt[!xdt, = true, on = paste0("v", 1:3)]] # [1] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" "5\t158\t159\t5\t148\t159\tphp\twzw\t0"
Comments
Post a Comment