regex - Create new column from an existing column with pattern matching in R -


i'm trying add new column based on using pattern matching. i've read this post, not getting desired output.

i want create new column (suborder) based on greatgroup column. have tried following:

suborder <- rep(na_character_, length(mydata))  suborder[grepl("udults", mydata, ignore.case = true)] <-  "udults" suborder[grepl("aquults", mydata, ignore.case = true)] <-  "aquults" suborder[grepl("aqualfs", mydata, ignore.case = true)] <-  "aqualfs" suborder[grepl("humods", mydata, ignore.case = true)] <-  "humods" suborder[grepl("udalfs", mydata, ignore.case = true)] <-  "udalfs" suborder[grepl("orthods", mydata, ignore.case = true)] <-  "orthods" suborder[grepl("udalfs", mydata, ignore.case = true)] <-  "udalfs" suborder[grepl("psamments", mydata, ignore.case = true)] <-  "psamments" suborder[grepl("udepts", mydata, ignore.case = true)] <-  "udepts" suborder[grepl("fluvents", mydata, ignore.case = true)] <-  "fluvents" suborder[grepl("aquods", mydata, ignore.case = true)] <-  "aquods" 

for example, i'm looking "udults" inside word, such hapludults or paleudults, , return "udults".

edit: if wants take shot @ alistaire's comment, search patterns use.

 subordernames <- c("udults", "aquults", "aqualfs", "humods", "udalfs", "orthods", "psamments", "udepts", "fluvents") 

example data below.

mydata <- dput(head(test)) structure(list(1:6, sid = c(200502l, 200502l, 200502l, 200502l,  200502l, 200502l), groupdepth = c(11l, 12l, 13l, 14l, 21l, 22l ), awc0to10 = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.12), awc10to20 = c(0.12,  0.12, 0.12, 0.12, 0.12, 0.12), awc20to50 = c(0.12, 0.12, 0.12,  0.12, 0.12, 0.12), awc50to100 = c(0.15, 0.15, 0.15, 0.15, 0.15,  0.15), db3rdbar0to10 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43),      db3rdbar10to20 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), db3rdbar20to50 = c(1.43,      1.43, 1.43, 1.43, 1.43, 1.43), db3rdbar50to100 = c(1.43,      1.43, 1.43, 1.43, 1.43, 1.43), hydrcratngpp = c(0l, 0l, 0l,      0l, 0l, 0l), orgmatter0to10 = c(1.25, 1.25, 1.25, 1.25, 1.25,      1.25), orgmatter10to20 = c(1.25, 1.25, 1.25, 1.25, 1.25,      1.25), orgmatter20to50 = c(1.02, 1.02, 1.02, 1.02, 1.02,      1.02), orgmatter50to100 = c(0.12, 0.12, 0.12, 0.12, 0.12,      0.12), clay0to10 = c(8, 8, 8, 8, 8, 8), clay10to20 = c(8,      8, 8, 8, 8, 8), clay20to50 = c(9.4, 9.4, 9.4, 9.4, 9.4, 9.4     ), clay50to100 = c(40, 40, 40, 40, 40, 40), sand0to10 = c(85,      85, 85, 85, 85, 85), sand10to20 = c(85, 85, 85, 85, 85, 85     ), sand20to50 = c(83, 83, 83, 83, 83, 83), sand50to100 = c(45.8,      45.8, 45.8, 45.8, 45.8, 45.8), phwater0to20 = c(6.3, 6.3,      6.3, 6.3, 6.3, 6.3), ksat0to10 = c(23, 23, 23, 23, 23, 23     ), ksat10to20 = c(23, 23, 23, 23, 23, 23), ksat20to50 = c(19.7333,      19.7333, 19.7333, 19.7333, 19.7333, 19.7333), ksat50to100 = c(9,      9, 9, 9, 9, 9), taxclname = c("fine, mixed, semiactive, mesic oxyaquic hapludults",      "fine, mixed, semiactive, mesic oxyaquic hapludults", "fine, mixed, semiactive, mesic oxyaquic hapludults",      "fine, mixed, semiactive, mesic oxyaquic hapludults", "fine, mixed, semiactive, mesic oxyaquic hapludults",      "fine, mixed, semiactive, mesic oxyaquic hapludults"), greatgroup = c("hapludults",      "hapludults", "hapludults", "hapludults", "hapludults", "hapludults"     )), .names = c("", "sid", "groupdepth", "awc0to10", "awc10to20",  "awc20to50", "awc50to100", "db3rdbar0to10", "db3rdbar10to20",  "db3rdbar20to50", "db3rdbar50to100", "hydrcratngpp", "orgmatter0to10",  "orgmatter10to20", "orgmatter20to50", "orgmatter50to100", "clay0to10",  "clay10to20", "clay20to50", "clay50to100", "sand0to10", "sand10to20",  "sand20to50", "sand50to100", "phwater0to20", "ksat0to10", "ksat10to20",  "ksat20to50", "ksat50to100", "taxclname", "greatgroup"), class = c("tbl_df",  "data.frame"), row.names = c(na, -6l)) 

a few options, of posted in comments above.

note: options assume replacement strings match patters pattern. if want else, they're editable include separate replacement values.

option 1: for + grepl

using same code original, looping avoid repetitive code:

# make list of patterns pat <- c('udults', 'aquults', 'aqualfs', 'humods', 'udalfs', 'orthods', 'psamments', 'udepts', 'fluvents', 'aquods')  suborder <- rep(na_character_, length(mydata))  for(x in 1:length(pat)){   suborder[grepl(pat[x], mydata$greatgroup, ignore.case = true)] <-  pat[x] } 

option 2: for + gsub

build new column in place copying mydata$greatgroup , altering gsub. regex pasted on includes characters within same string.

mydata$suborder <- mydata$greatgroup for(x in pat){   mydata$suborder <- gsub(paste0('.*', x, '.*'), x, mydata$suborder, ignore.case = true) } 

note values not matched 1 of strings in pat have value greatgroup, not na. if want them na, fix them with

mydata$suborder[!(mydata$suborder %in% pat)] <- na 

option 3: named list + stringr::str_replace_all

my favorite because doesn't loop, although requires stringr package (which pretty awesome, anyway).

make named list pat, name regex want replace, , item string match:

l <- as.list(pat) names(l) <- paste0('.*', pat, '.*') 

so looks like

> l $`.*udults.*` [1] "udults"  $`.*aquults.*` [1] "aquults"  $`.*aqualfs.*` [1] "aqualfs" ...... 

then use str_replace_all @ once:

mydata$suborder <- str_replace_all(mydata$greatgroup, l) 

boom.

note 1: str_replace_all doesn't have ignore.case option, can wrap mydata$greatgroup in tolower (easy) or reconfigure regex (hard).

note 2: option 2, leaves unmatched entries value greatgroup, use line @ end of option go nas, if like.


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

dataset - MPAndroidchart returning no chart Data available -

java - No use of nillable="0" in SOAP Webservice -