R split data to frequency -
this question has answer here:
- separating column in r [duplicate] 2 answers
i have data set this, variable ("item") contains comma-separated codes:
id item 1 102, 103,401, 2 108,102,301 3 103, 108 , 405, 505, 708
for each id, frequencies of each separate item, this:
id 102 103 104 108 301 401 ... 1 1 1 1 2 1 1 1 3 1 1
how can that?
we can mtabulate
qdaptools
library(qdaptools) cbind(dat['id'], mtabulate(strsplit(dat$item, '\\s*,\\s*'))) # id 102 103 108 301 401 405 505 708 #1 1 1 1 0 0 1 0 0 0 #2 2 1 0 1 1 0 0 0 0 #3 3 0 1 1 0 0 1 1 1
note: data taken @thelatemail's post.
or option (if need sparsematrix
)
library(matrix) #split 'item' column `list` lst <- strsplit(dat$item, '\\s*,\\s*') #get `unique` elements after `unlist`ing. un1 <- sort(unique(unlist(lst))) #create `sparsematrix` specifying row #column index along dim names (if needed) sm <- sparsematrix(rep(dat$id, lengths(lst)), match(unlist(lst), un1), x= 1, dimnames=list(dat$id, un1)) sm # 3 x 8 sparse matrix of class "dgcmatrix" # 102 103 108 301 401 405 505 708 #1 1 1 . . 1 . . . #2 1 . 1 1 . . . . #3 . 1 1 . . 1 1 1
it can converted matrix
wrapping as.matrix
as.matrix(sm) # 102 103 108 301 401 405 505 708 #1 1 1 0 0 1 0 0 0 #2 1 0 1 1 0 0 0 0 #3 0 1 1 0 0 1 1 1
Comments
Post a Comment