r - Is there a simple way to subset unique values and their associated datum? -
this question has answer here:
i have large data set contains ~300,000 rows , 60 columns. if want subset unique characteristics within 1 variable use unique()
function create data.frame
list of unique values in variable. match master data frame associated data master file.
this process little cumbersome, wondering if there faster way same thing? example, there function use select unique fields , associated data connected values?
for example: make new data frame contains unique surveyid_block id , associated island code , abundances.
structure(list(surveyid_block = c("62003713_2", "62003087_2", "62003713_2", "62003713_2", "62003713_1", "62003713_2", "62003713_1", "62003713_2", "62003713_2", "62003087_1", "62003713_1", "62003713_1", "62003713_2", "62003713_2", "62003713_1", "62003087_1", "62003087_2", "62003713_2", "62003713_2", "62003713_2", "62003087_2", "62003713_2", "62003713_1", "62003713_1", "62003713_1", "62003713_1", "62003713_2", "62003713_1", "62003713_2", "62003087_1", "62003713_2", "62003087_1", "62003713_1", "62003087_2", "62003087_2", "62003713_2", "62003713_1", "62003087_1", "62003713_1", "62003713_1", "62003713_1", "62003087_2", "62003087_2", "62003713_2", "62003713_2", "62003713_2", "62003713_1", "62003087_1", "62003713_2", "62003087_2", "62003713_1", "62003713_1", "62003713_2", "62003713_1", "62003713_2", "62003087_2", "62003087_2", "62003087_1", "62003087_1", "62003713_1", "62003087_1", "62003087_1", "62003087_2", "62003087_2", "62003713_2", "62003713_1", "62003713_2", "62003713_2", "62003713_2", "62003713_1", "62003713_2", "62003087_1", "62003713_1", "62003713_1", "62003087_1", "62003087_1", "62003713_1", "62003087_2", "62003087_1", "62003087_2", "62003087_2", "62003087_1", "62003087_1", "62003087_1", "62003713_2", "62003087_2", "62003713_2", "62003087_2", "62003713_1", "62003713_1", "62003087_2", "62003087_1", "62003087_1", "62003087_1", "62003713_2", "62003713_2", "62003087_1", "62003713_1", "62003087_1", "62003087_2"), islandcode = c(1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l, 1391l ), totalabun = c(667l, 174l, 667l, 667l, 715l, 667l, 715l, 667l, 667l, 1365l, 715l, 715l, 667l, 667l, 715l, 1365l, 174l, 667l, 667l, 667l, 174l, 667l, 715l, 715l, 715l, 715l, 667l, 715l, 667l, 1365l, 667l, 1365l, 715l, 174l, 174l, 667l, 715l, 1365l, 715l, 715l, 715l, 174l, 174l, 667l, 667l, 667l, 715l, 1365l, 667l, 174l, 715l, 715l, 667l, 715l, 667l, 174l, 174l, 1365l, 1365l, 715l, 1365l, 1365l, 174l, 174l, 667l, 715l, 667l, 667l, 667l, 715l, 667l, 1365l, 715l, 715l, 1365l, 1365l, 715l, 174l, 1365l, 174l, 174l, 1365l, 1365l, 1365l, 667l, 174l, 667l, 174l, 715l, 715l, 174l, 1365l, 1365l, 1365l, 667l, 667l, 1365l, 715l, 1365l, 174l)), .names = c("surveyid_block", "islandcode", "totalabun" ), row.names = c(na, 100l), class = "data.frame")
we can split
dataset 'surveyid_block' create list
of data.frame
s. better keep datasets in list
rather creating individual data.frame objects in global environment.
lst <- split(df1, df1$surveyid_block)
but, if need create individual datasets, can done list2env
list2env(setnames(lst, paste0('dfn', seq_along(lst))), envir=.globalenv)
Comments
Post a Comment