how can I split in r text in frequency matrix? -


this question has answer here:

starting data imported

dati<- ( read.csv(file='c:...csv', header=true, sep=";")) 

i've chosen 2 variables

id<-dati$post_visid_low item<-dati$event_list 

than

id<-as.character(id) item<-as.character(item) 

datat <- data.table(id, item) structure of datat is

id   item 1    102, 104, 108,401 2    405, 103, 650, 555, 450 3    305, 109 

i want obtain matrix of frequences ordined columns

id  102  103  104  108 109  305  401   405   450    555   650 1    1         1    1 2         1                             1     1      1 3                        1    1 

how can this? tried

library(matrix) id<-as.character(id) item<-as.character(item) datat <- data.table(id, item) lst <- strsplit(datat$item, '\\s*,\\s*') un1 <- sort(unique(unlist(lst))) sm <-  sparsematrix(rep(datat$id, length(lst)),                      match(unlist(lst), un1), x= 1,                      dimnames=list(datat$id, un1)) 

but recevive error

error in + (!(m.i || i1)) : non-numeric argument binary operator 

how can that?

we can use package splitstackshape splitting, , combination of melting , dcasting our data format specified (note it's not practical have numerical column names.

library(splitstackshape)  # split data step1 <- csplit(dat, splitcols="item") step1 #    id item_1 item_2 item_3 item_4 item_5 # 1:  1    102    104    108    401     na # 2:  2    405    103    650    555    450 # 3:  3    305    109     na     na     na  # reshape , remove missings step2 <- melt(step1, id.vars="id")[!is.na(value),]  # turn wide output <- dcast(step2, id~value, fun.aggregate = length)  # or in 1 line  output <- dcast(melt(csplit(dat, splitcols="item"), id.vars="id")[!is.na(value),],                  id~value, fun.aggregate = length)  output #    id 102 103 104 108 109 305 401 405 450 555 650 # 1:  1   1   0   1   1   0   0   1   0   0   0   0 # 2:  2   0   1   0   0   0   0   0   1   1   1   1 # 3:  3   0   0   0   0   1   1   0   0   0   0   0 

alternatively, can use csplit_e same package:

csplit_e(dat, "item", ",", type = "character", fill = 0, drop = true)   id item_102 item_103 item_104 item_108 item_109 item_305 item_401 item_405 item_450 item_555 item_650 # 1  1        1        0        1        1        0        0        1        0        0        0        0 # 2  2        0        1        0        0        0        0        0        1        1        1        1 # 3  3        0        0        0        0        1        1        0        0        0        0        0 

data used:

dat <- data.frame(id=1:3, item=c("102, 104, 108,401","405, 103, 650, 555, 450","305, 109")) 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -