dataset - data matching/data selection with multiple conditions in a long shaped database r -


i have been struggling problem while, it's rather complex data selection multiple possible output , can't find expression want. measuring divorce rates in colony of birds.

reproducible database:

nest<- rep(seq(1:10),2) year<- c(rep(2014, 10), rep(2015, 10)) pair<- c("th4327_th4317", "2", "th8522_t75390" ,"4", "tj1704_tj1703", "th4335_th4333",          "7", "8", "th4337_th4323", "t74703_th1797",          "th4327_th4317", "12", "th8522_t75550","14", "tj1704_na" , "th4335_th4333",           "17", "th8715_th8714", "th4388_th4323", "te9639_th9675") test<- data.frame(nest, year, pair) test$pair <- as.character(test$pair) test$year <- as.character(test$year) 

the underscore separates id of 2 members of pair. when no id present growing number placed. same nests each year displayed. in 2 consecutive years have 5 possible scenarios (the numbers nest ids):

same pair 2014-2015: 1-6

empty 2014-2015: 2-4-7

empty 2014 occupied 2015: 8

change of pairs in same nest: 10

change of 1 of member of pair: 3-9

unknown: 5

the results after are:

pairs stayed "same pair 2014-2015" : 2
pairs in 1 changed "change of 1 of member of pair": 2

i figured how calculate pairs stay together...

same<-test$pair[test$year=="2014"] %in% test$pair[test$year=="2015"] table(same) 

however cannot obtain information pairs divorce.

i tried several commands, which , ifelse, have not been successful.

i happy give further explanation if not clear. know quite messy problem.

thanks lot, best.

have fun

here approach using merge. strategy follows. first split pairs p1 , p2 (i did tidyr::separate). subset data across years , merge using p1 unique identifier. means there 2 different p2, 1 2014 , 1 2015. straightforward test if groups stay or divorce.

if have many years, approach need generalized. gladly provide such generalization if need be.

library(tidyr)  test <-  test %>%   filter(nchar(test$pair) > 3) %>% #getting rid of missing pairs   separate(pair, c("p1", "p2"), "_") %>%   select(-nest) #getting rid of nest superfluous   test <- merge(test[test$year=="2014",], test[test$year=="2015",], = "p1", = true)  #same group across 2014 , 2015 na.omit(test[test$p2.x == test$p2.y, grep("p", names(test))])  #different group across 2014 , 2015 na.omit(test[test$p2.x != test$p2.y, grep("p", names(test))]) 

update

to generalize code many years use following code. better approach looping. note above code did not work because forgot include dplyr library. sure download , load both dplyr , tidyr. these libraries great data manipulation. here sources on tidyr , dplyr. let me know if have more problems.

library(tidyr) library(dplyr)  test <-  test %>%   filter(nchar(test$pair) > 3) %>% #getting rid of missing pairs   separate(pair, c("p1", "p2"), "_") %>% #splitting pairs   select(-nest) #getting rid of nest superfluous   test <- split(test, test$year) #split data lists year test <- map(function(d, n){names(d)[grepl("p2", names(d))] <- paste("p2", n, sep = "_"); d}, d = test, n = names(test)) #this line can omitted.  insures final data set looks nice. test <- reduce(function(...) merge(..., = "p1", = true), test) 

without packages (i.e. in base r)

if don't want use dplyr , tidyr packages can replace first several lines of code (up until when split called) base r approach

test <- test[nchar(test$pair) > 3, !names(test)%in%"nest"]  split_pair <- do.call(rbind, strsplit(test$pair, "_"))  test$p1 <- split_pair[, 1] test$p2 <- split_pair[, 2] test <- test[, !names(test)%in%"pair"] 

final update... hopefully

have fun brings great point in comment below. since use p1 unique identifier, not possible identify when p2 changes. overcome following...

 test <- split(test, test$year) #split data lists year   test <- reduce(function(...) merge(..., = c("p1", "p2"), = true), test) #merge on both p1 , p2 overcome previous problem.  pair unique identifiers  #stayed in same relationship stay = test$year.x == "2014" & test$year.y == "2015" na.omit(test[stay, ])  #p1 changes couples between year.x , year.y tp1 <- test[test$p1 %in% test[duplicated(test$p1), "p1"], c("p1", "p2", "year.x", "year.y")] is_na <- (is.na(tp1$year.x) & is.na(tp1$year.y)) stay_tp1 <- tp1$year.x == "2014" & tp1$year.y == "2015" stay_tp1[is.na(stay_tp1)] <- false tp1 <- tp1[!(stay_tp1 | is_na), ]  #a similar approach works p2.  notice best in function.  if use function remember need pass variables strings, unless want use nse. 

the final bit of code might bit confusing. let me explain. identify if bird changes partners identify duplicates, since bird moves 1 pair appear twice. in case of many years, however, bird can change pairs in 1 of several years. identify correct year bird changes need use above code. suggest construct function deal case, since there fair bit of typing involved.


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -