r - Find unique set of strings in vector where vector elements can be multiple strings -
i have series of batch records labeled sequentially. batches overlap.
x <- c("1","1","1/2","2","3","4","5/4","5") > data.frame(x) x 1 1 2 1 3 1/2 4 2 5 3 6 4 7 5/4 8 5
i want find set of batches not overlapping , label periods. batch "1/2" includes both "1" , "2" not unique. when batch = "3" not contained in previous batches, starts new period. i'm having difficulty dealing combined batches, otherwise straightforward. result of be:
x period 1 1 1 2 1 1 3 1/2 1 4 2 1 5 3 2 6 4 3 7 5/4 3 8 5 3
my experience in more functional programming paradigms, know way did un-r. i'm looking way in r clean , simple. appreciated.
here's un-r code works, super clunky , not extensible.
x <- c("1","1","1/2","2","3","4","5/4","5") p <- 1 #period number temp <- null #temp variable storing cases of x (batches) temp[1] <- x[1] period <- null rl <- 0 #length repeat period (i in 1:length(x)){ #check "/", split , add temp if (grepl("/", x[i])){ z <- strsplit(x[i], "/") #split character z <- unlist(z) #convert vector temp <- c(temp, z, x[i]) #add temp vector comparison } #check if x in temp if(x[i] %in% temp){ temp <- append(temp, x[i]) #add search vector rl <- rl + 1 #increase length } else { period <- append(period, rep(p, rl)) #add period vector p <- p + 1 #increase period count temp <- null #reset rl <- 1 #reset } } #add last batch rl <- length(x) - length(period) period <- append(period, rep(p,rl)) df <- data.frame(x,period) > df x period 1 1 1 2 1 1 3 1/2 1 4 2 1 5 3 2 6 4 3 7 5/4 3 8 5 3
a little bit shorter:
x <- c("1","1","1/2","2","3","4","5/4","5") x<-data.frame(x=x, period=-1, stringsasfactors = f) period=0 prevbatch=-1 (i in 1:nrow(x)) { spl=unlist(strsplit(x$x[i], "/")) currentbatch=min(spl) if (currentbatch<prevbatch) { stop("error in sequence") } if (currentbatch>prevbatch) period=period+1; x$period[i]=period; prevbatch=max(spl) } x
Comments
Post a Comment