R setdiff() by regex -
is possible customize setdiff using regular expressions see in 1 vector , not another? example:
x <- c("1\t119\t120\t1\t119\t120\tabc\tdef\t0", "2\t558\t559\t2\t558\t559\tghi\tjkl\t0", "3\t139\t141\t3\t139\t141\tmno\tpqr\t0", "3\t139\t143\t3\t139\t143\tstu\tvwx\t0") [1] "1\t119\t120\t1\t119\t120\tabc\tdef\t0" [2] "2\t558\t559\t2\t558\t559\tghi\tjkl\t0" [3] "3\t139\t141\t3\t139\t141\tmno\tpqr\t0" [4] "3\t139\t143\t3\t139\t143\tstu\tvwx\t0" y <- c("1\t119\t120\t1\t109\t120\tabc\tdef\t0", "2\t558\t559\t2\t548\t559\tghi\tjkl\t0", "3\t139\t141\t3\t129\t141\tmno\tpqr\t0", "3\t139\t143\t3\t129\t143\tstu\tvwx\t0", "4\t157\t158\t4\t147\t158\txwx\tyty\t0", "5\t158\t159\t5\t148\t159\tphp\twzw\t0") [1] "1\t119\t120\t1\t109\t120\tabc\tdef\t0" [2] "2\t558\t559\t2\t548\t559\tghi\tjkl\t0" [3] "3\t139\t141\t3\t129\t141\tmno\tpqr\t0" [4] "3\t139\t143\t3\t129\t143\tstu\tvwx\t0" [5] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" [6] "5\t158\t159\t5\t148\t159\tphp\twzw\t0" i want able show that:
[5] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" [6] "5\t158\t159\t5\t148\t159\tphp\twzw\t0" are new because 4\t157\t158 , 4\t157\t158 unique y. doesn't work:
> setdiff(y,x) [1] "1\t119\t120\t1\t109\t120\tabc\tdef\t0" "2\t558\t559\t2\t548\t559\tghi\tjkl\t0" [3] "3\t139\t141\t3\t129\t141\tmno\tpqr\t0" "3\t139\t143\t3\t129\t143\tstu\tvwx\t0" [5] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" "5\t158\t159\t5\t148\t159\tphp\twzw\t0" because column 5 different in both x , y. want setdiff based on first 3 columns.
a simple example of setdiff can found here: how tell in 1 vector , not another?
one way put x , y data.frames , anti-join. i'll use data.table since find more natural.
library(data.table) xdt <- as.data.table(do.call("rbind", strsplit(x, split = "\t"))) ydt <- as.data.table(do.call("rbind", strsplit(y, split = "\t"))) now anti-join (a "setdiff" data.frames/data.tables):
ydt[!xdt, on = paste0("v", 1:3)] # v1 v2 v3 v4 v5 v6 v7 v8 v9 # 1: 4 157 158 4 147 158 xwx yty 0 # 2: 5 158 159 5 148 159 php wzw 0 you row index (thanks @frank suggested improvement/simplification):
> ydt[!xdt, = true, on = paste0("v", 1:3)] or extract directly y:
> y[ydt[!xdt, = true, on = paste0("v", 1:3)]] # [1] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" "5\t158\t159\t5\t148\t159\tphp\twzw\t0"
Comments
Post a Comment