regex - Create new column from an existing column with pattern matching in R -
i'm trying add new column based on using pattern matching. i've read this post, not getting desired output.
i want create new column (suborder) based on greatgroup column. have tried following:
suborder <- rep(na_character_, length(mydata)) suborder[grepl("udults", mydata, ignore.case = true)] <- "udults" suborder[grepl("aquults", mydata, ignore.case = true)] <- "aquults" suborder[grepl("aqualfs", mydata, ignore.case = true)] <- "aqualfs" suborder[grepl("humods", mydata, ignore.case = true)] <- "humods" suborder[grepl("udalfs", mydata, ignore.case = true)] <- "udalfs" suborder[grepl("orthods", mydata, ignore.case = true)] <- "orthods" suborder[grepl("udalfs", mydata, ignore.case = true)] <- "udalfs" suborder[grepl("psamments", mydata, ignore.case = true)] <- "psamments" suborder[grepl("udepts", mydata, ignore.case = true)] <- "udepts" suborder[grepl("fluvents", mydata, ignore.case = true)] <- "fluvents" suborder[grepl("aquods", mydata, ignore.case = true)] <- "aquods"
for example, i'm looking "udults" inside word, such hapludults or paleudults, , return "udults".
edit: if wants take shot @ alistaire's comment, search patterns use.
subordernames <- c("udults", "aquults", "aqualfs", "humods", "udalfs", "orthods", "psamments", "udepts", "fluvents")
example data below.
mydata <- dput(head(test)) structure(list(1:6, sid = c(200502l, 200502l, 200502l, 200502l, 200502l, 200502l), groupdepth = c(11l, 12l, 13l, 14l, 21l, 22l ), awc0to10 = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.12), awc10to20 = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.12), awc20to50 = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.12), awc50to100 = c(0.15, 0.15, 0.15, 0.15, 0.15, 0.15), db3rdbar0to10 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), db3rdbar10to20 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), db3rdbar20to50 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), db3rdbar50to100 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), hydrcratngpp = c(0l, 0l, 0l, 0l, 0l, 0l), orgmatter0to10 = c(1.25, 1.25, 1.25, 1.25, 1.25, 1.25), orgmatter10to20 = c(1.25, 1.25, 1.25, 1.25, 1.25, 1.25), orgmatter20to50 = c(1.02, 1.02, 1.02, 1.02, 1.02, 1.02), orgmatter50to100 = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.12), clay0to10 = c(8, 8, 8, 8, 8, 8), clay10to20 = c(8, 8, 8, 8, 8, 8), clay20to50 = c(9.4, 9.4, 9.4, 9.4, 9.4, 9.4 ), clay50to100 = c(40, 40, 40, 40, 40, 40), sand0to10 = c(85, 85, 85, 85, 85, 85), sand10to20 = c(85, 85, 85, 85, 85, 85 ), sand20to50 = c(83, 83, 83, 83, 83, 83), sand50to100 = c(45.8, 45.8, 45.8, 45.8, 45.8, 45.8), phwater0to20 = c(6.3, 6.3, 6.3, 6.3, 6.3, 6.3), ksat0to10 = c(23, 23, 23, 23, 23, 23 ), ksat10to20 = c(23, 23, 23, 23, 23, 23), ksat20to50 = c(19.7333, 19.7333, 19.7333, 19.7333, 19.7333, 19.7333), ksat50to100 = c(9, 9, 9, 9, 9, 9), taxclname = c("fine, mixed, semiactive, mesic oxyaquic hapludults", "fine, mixed, semiactive, mesic oxyaquic hapludults", "fine, mixed, semiactive, mesic oxyaquic hapludults", "fine, mixed, semiactive, mesic oxyaquic hapludults", "fine, mixed, semiactive, mesic oxyaquic hapludults", "fine, mixed, semiactive, mesic oxyaquic hapludults"), greatgroup = c("hapludults", "hapludults", "hapludults", "hapludults", "hapludults", "hapludults" )), .names = c("", "sid", "groupdepth", "awc0to10", "awc10to20", "awc20to50", "awc50to100", "db3rdbar0to10", "db3rdbar10to20", "db3rdbar20to50", "db3rdbar50to100", "hydrcratngpp", "orgmatter0to10", "orgmatter10to20", "orgmatter20to50", "orgmatter50to100", "clay0to10", "clay10to20", "clay20to50", "clay50to100", "sand0to10", "sand10to20", "sand20to50", "sand50to100", "phwater0to20", "ksat0to10", "ksat10to20", "ksat20to50", "ksat50to100", "taxclname", "greatgroup"), class = c("tbl_df", "data.frame"), row.names = c(na, -6l))
a few options, of posted in comments above.
note: options assume replacement strings match patters pattern. if want else, they're editable include separate replacement values.
option 1: for
+ grepl
using same code original, looping avoid repetitive code:
# make list of patterns pat <- c('udults', 'aquults', 'aqualfs', 'humods', 'udalfs', 'orthods', 'psamments', 'udepts', 'fluvents', 'aquods') suborder <- rep(na_character_, length(mydata)) for(x in 1:length(pat)){ suborder[grepl(pat[x], mydata$greatgroup, ignore.case = true)] <- pat[x] }
option 2: for
+ gsub
build new column in place copying mydata$greatgroup
, altering gsub
. regex pasted on includes characters within same string.
mydata$suborder <- mydata$greatgroup for(x in pat){ mydata$suborder <- gsub(paste0('.*', x, '.*'), x, mydata$suborder, ignore.case = true) }
note values not matched 1 of strings in pat
have value greatgroup
, not na
. if want them na
, fix them with
mydata$suborder[!(mydata$suborder %in% pat)] <- na
option 3: named list + stringr::str_replace_all
my favorite because doesn't loop, although requires stringr
package (which pretty awesome, anyway).
make named list pat
, name regex want replace, , item string match:
l <- as.list(pat) names(l) <- paste0('.*', pat, '.*')
so looks like
> l $`.*udults.*` [1] "udults" $`.*aquults.*` [1] "aquults" $`.*aqualfs.*` [1] "aqualfs" ......
then use str_replace_all
@ once:
mydata$suborder <- str_replace_all(mydata$greatgroup, l)
boom.
note 1: str_replace_all
doesn't have ignore.case
option, can wrap mydata$greatgroup
in tolower
(easy) or reconfigure regex (hard).
note 2: option 2, leaves unmatched entries value greatgroup
, use line @ end of option go na
s, if like.
Comments
Post a Comment