R setdiff() by regex -


is possible customize setdiff using regular expressions see in 1 vector , not another? example:

x <- c("1\t119\t120\t1\t119\t120\tabc\tdef\t0", "2\t558\t559\t2\t558\t559\tghi\tjkl\t0", "3\t139\t141\t3\t139\t141\tmno\tpqr\t0", "3\t139\t143\t3\t139\t143\tstu\tvwx\t0")  [1] "1\t119\t120\t1\t119\t120\tabc\tdef\t0" [2] "2\t558\t559\t2\t558\t559\tghi\tjkl\t0"        [3] "3\t139\t141\t3\t139\t141\tmno\tpqr\t0"    [4] "3\t139\t143\t3\t139\t143\tstu\tvwx\t0"     y <- c("1\t119\t120\t1\t109\t120\tabc\tdef\t0", "2\t558\t559\t2\t548\t559\tghi\tjkl\t0", "3\t139\t141\t3\t129\t141\tmno\tpqr\t0", "3\t139\t143\t3\t129\t143\tstu\tvwx\t0", "4\t157\t158\t4\t147\t158\txwx\tyty\t0", "5\t158\t159\t5\t148\t159\tphp\twzw\t0")  [1] "1\t119\t120\t1\t109\t120\tabc\tdef\t0" [2] "2\t558\t559\t2\t548\t559\tghi\tjkl\t0"        [3] "3\t139\t141\t3\t129\t141\tmno\tpqr\t0"    [4] "3\t139\t143\t3\t129\t143\tstu\tvwx\t0"    [5] "4\t157\t158\t4\t147\t158\txwx\tyty\t0"   [6] "5\t158\t159\t5\t148\t159\tphp\twzw\t0"  

i want able show that:

[5] "4\t157\t158\t4\t147\t158\txwx\tyty\t0"   [6] "5\t158\t159\t5\t148\t159\tphp\twzw\t0"  

are new because 4\t157\t158 , 4\t157\t158 unique y. doesn't work:

> setdiff(y,x) [1] "1\t119\t120\t1\t109\t120\tabc\tdef\t0" "2\t558\t559\t2\t548\t559\tghi\tjkl\t0" [3] "3\t139\t141\t3\t129\t141\tmno\tpqr\t0" "3\t139\t143\t3\t129\t143\tstu\tvwx\t0" [5] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" "5\t158\t159\t5\t148\t159\tphp\twzw\t0"   

because column 5 different in both x , y. want setdiff based on first 3 columns.

a simple example of setdiff can found here: how tell in 1 vector , not another?

one way put x , y data.frames , anti-join. i'll use data.table since find more natural.

library(data.table) xdt <- as.data.table(do.call("rbind", strsplit(x, split = "\t"))) ydt <- as.data.table(do.call("rbind", strsplit(y, split = "\t"))) 

now anti-join (a "setdiff" data.frames/data.tables):

ydt[!xdt, on = paste0("v", 1:3)] #    v1  v2  v3 v4  v5  v6  v7  v8 v9 # 1:  4 157 158  4 147 158 xwx yty  0 # 2:  5 158 159  5 148 159 php wzw  0 

you row index (thanks @frank suggested improvement/simplification):

> ydt[!xdt, = true, on = paste0("v", 1:3)] 

or extract directly y:

> y[ydt[!xdt, = true, on = paste0("v", 1:3)]] # [1] "4\t157\t158\t4\t147\t158\txwx\tyty\t0" "5\t158\t159\t5\t148\t159\tphp\twzw\t0" 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -