regex - Create new column from an existing column with pattern matching in R -

i'm trying add new column based on using pattern matching. i've read this post, not getting desired output.

i want create new column (suborder) based on greatgroup column. have tried following:

suborder <- rep(na_character_, length(mydata))  suborder[grepl("udults", mydata, ignore.case = true)] <-  "udults" suborder[grepl("aquults", mydata, ignore.case = true)] <-  "aquults" suborder[grepl("aqualfs", mydata, ignore.case = true)] <-  "aqualfs" suborder[grepl("humods", mydata, ignore.case = true)] <-  "humods" suborder[grepl("udalfs", mydata, ignore.case = true)] <-  "udalfs" suborder[grepl("orthods", mydata, ignore.case = true)] <-  "orthods" suborder[grepl("udalfs", mydata, ignore.case = true)] <-  "udalfs" suborder[grepl("psamments", mydata, ignore.case = true)] <-  "psamments" suborder[grepl("udepts", mydata, ignore.case = true)] <-  "udepts" suborder[grepl("fluvents", mydata, ignore.case = true)] <-  "fluvents" suborder[grepl("aquods", mydata, ignore.case = true)] <-  "aquods"

for example, i'm looking "udults" inside word, such hapludults or paleudults, , return "udults".

edit: if wants take shot @ alistaire's comment, search patterns use.

 subordernames <- c("udults", "aquults", "aqualfs", "humods", "udalfs", "orthods", "psamments", "udepts", "fluvents")

example data below.

mydata <- dput(head(test)) structure(list(1:6, sid = c(200502l, 200502l, 200502l, 200502l,  200502l, 200502l), groupdepth = c(11l, 12l, 13l, 14l, 21l, 22l ), awc0to10 = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.12), awc10to20 = c(0.12,  0.12, 0.12, 0.12, 0.12, 0.12), awc20to50 = c(0.12, 0.12, 0.12,  0.12, 0.12, 0.12), awc50to100 = c(0.15, 0.15, 0.15, 0.15, 0.15,  0.15), db3rdbar0to10 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43),      db3rdbar10to20 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), db3rdbar20to50 = c(1.43,      1.43, 1.43, 1.43, 1.43, 1.43), db3rdbar50to100 = c(1.43,      1.43, 1.43, 1.43, 1.43, 1.43), hydrcratngpp = c(0l, 0l, 0l,      0l, 0l, 0l), orgmatter0to10 = c(1.25, 1.25, 1.25, 1.25, 1.25,      1.25), orgmatter10to20 = c(1.25, 1.25, 1.25, 1.25, 1.25,      1.25), orgmatter20to50 = c(1.02, 1.02, 1.02, 1.02, 1.02,      1.02), orgmatter50to100 = c(0.12, 0.12, 0.12, 0.12, 0.12,      0.12), clay0to10 = c(8, 8, 8, 8, 8, 8), clay10to20 = c(8,      8, 8, 8, 8, 8), clay20to50 = c(9.4, 9.4, 9.4, 9.4, 9.4, 9.4     ), clay50to100 = c(40, 40, 40, 40, 40, 40), sand0to10 = c(85,      85, 85, 85, 85, 85), sand10to20 = c(85, 85, 85, 85, 85, 85     ), sand20to50 = c(83, 83, 83, 83, 83, 83), sand50to100 = c(45.8,      45.8, 45.8, 45.8, 45.8, 45.8), phwater0to20 = c(6.3, 6.3,      6.3, 6.3, 6.3, 6.3), ksat0to10 = c(23, 23, 23, 23, 23, 23     ), ksat10to20 = c(23, 23, 23, 23, 23, 23), ksat20to50 = c(19.7333,      19.7333, 19.7333, 19.7333, 19.7333, 19.7333), ksat50to100 = c(9,      9, 9, 9, 9, 9), taxclname = c("fine, mixed, semiactive, mesic oxyaquic hapludults",      "fine, mixed, semiactive, mesic oxyaquic hapludults", "fine, mixed, semiactive, mesic oxyaquic hapludults",      "fine, mixed, semiactive, mesic oxyaquic hapludults", "fine, mixed, semiactive, mesic oxyaquic hapludults",      "fine, mixed, semiactive, mesic oxyaquic hapludults"), greatgroup = c("hapludults",      "hapludults", "hapludults", "hapludults", "hapludults", "hapludults"     )), .names = c("", "sid", "groupdepth", "awc0to10", "awc10to20",  "awc20to50", "awc50to100", "db3rdbar0to10", "db3rdbar10to20",  "db3rdbar20to50", "db3rdbar50to100", "hydrcratngpp", "orgmatter0to10",  "orgmatter10to20", "orgmatter20to50", "orgmatter50to100", "clay0to10",  "clay10to20", "clay20to50", "clay50to100", "sand0to10", "sand10to20",  "sand20to50", "sand50to100", "phwater0to20", "ksat0to10", "ksat10to20",  "ksat20to50", "ksat50to100", "taxclname", "greatgroup"), class = c("tbl_df",  "data.frame"), row.names = c(na, -6l))

a few options, of posted in comments above.

note: options assume replacement strings match patters pattern. if want else, they're editable include separate replacement values.

option 1: `for` + `grepl`

using same code original, looping avoid repetitive code:

# make list of patterns pat <- c('udults', 'aquults', 'aqualfs', 'humods', 'udalfs', 'orthods', 'psamments', 'udepts', 'fluvents', 'aquods')  suborder <- rep(na_character_, length(mydata))  for(x in 1:length(pat)){   suborder[grepl(pat[x], mydata$greatgroup, ignore.case = true)] <-  pat[x] }

option 2: `for` + `gsub`

build new column in place copying mydata$greatgroup , altering gsub. regex pasted on includes characters within same string.

mydata$suborder <- mydata$greatgroup for(x in pat){   mydata$suborder <- gsub(paste0('.*', x, '.*'), x, mydata$suborder, ignore.case = true) }

note values not matched 1 of strings in pat have value greatgroup, not na. if want them na, fix them with

mydata$suborder[!(mydata$suborder %in% pat)] <- na

option 3: named list + `stringr::str_replace_all`

my favorite because doesn't loop, although requires stringr package (which pretty awesome, anyway).

make named list pat, name regex want replace, , item string match:

l <- as.list(pat) names(l) <- paste0('.*', pat, '.*')

so looks like

> l $`.*udults.*` [1] "udults"  $`.*aquults.*` [1] "aquults"  $`.*aqualfs.*` [1] "aqualfs" ......

then use str_replace_all @ once:

mydata$suborder <- str_replace_all(mydata$greatgroup, l)

boom.

note 1: str_replace_all doesn't have ignore.case option, can wrap mydata$greatgroup in tolower (easy) or reconfigure regex (hard).

note 2: option 2, leaves unmatched entries value greatgroup, use line @ end of option go nas, if like.

Search This Blog

Ben