r - Find unique set of strings in vector where vector elements can be multiple strings -


i have series of batch records labeled sequentially. batches overlap.

x <- c("1","1","1/2","2","3","4","5/4","5") > data.frame(x)     x 1   1 2   1 3 1/2 4   2 5   3 6   4 7 5/4 8   5 

i want find set of batches not overlapping , label periods. batch "1/2" includes both "1" , "2" not unique. when batch = "3" not contained in previous batches, starts new period. i'm having difficulty dealing combined batches, otherwise straightforward. result of be:

    x period 1   1      1 2   1      1 3 1/2      1 4   2      1 5   3      2 6   4      3 7 5/4      3 8   5      3 

my experience in more functional programming paradigms, know way did un-r. i'm looking way in r clean , simple. appreciated.

here's un-r code works, super clunky , not extensible.

x <- c("1","1","1/2","2","3","4","5/4","5")  p <- 1 #period number temp <- null #temp variable storing cases of x (batches) temp[1] <- x[1] period <- null rl <- 0 #length repeat period  (i in 1:length(x)){      #check "/", split , add temp     if (grepl("/", x[i])){         z <- strsplit(x[i], "/") #split character         z <- unlist(z) #convert vector         temp <- c(temp, z, x[i]) #add temp vector comparison     }      #check if x in temp     if(x[i] %in% temp){         temp <- append(temp, x[i]) #add search vector         rl <- rl + 1 #increase length     } else {         period <- append(period, rep(p, rl)) #add period vector         p <- p + 1 #increase period count         temp <- null #reset         rl <- 1 #reset     } }  #add last batch  rl <- length(x) - length(period) period <- append(period, rep(p,rl))  df <- data.frame(x,period)  > df     x period 1   1      1 2   1      1 3 1/2      1 4   2      1 5   3      2 6   4      3 7 5/4      3 8   5      3 

a little bit shorter:

x <- c("1","1","1/2","2","3","4","5/4","5") x<-data.frame(x=x, period=-1, stringsasfactors = f) period=0 prevbatch=-1 (i in 1:nrow(x)) {    spl=unlist(strsplit(x$x[i], "/"))    currentbatch=min(spl)    if (currentbatch<prevbatch) { stop("error in sequence") }    if (currentbatch>prevbatch)       period=period+1;     x$period[i]=period;     prevbatch=max(spl) } x 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -