r - Remove duplicates from dataset based on criteria -


this question has answer here:

i have dataset of scores:

    id      sub     score     1       mat     45     2       mat     34     3       mat     67     1       mat     43     2       mat     34     4       mat     22     5       sci     78     6       mat     32     1       mat     56     1       sci     40 

i want output top scores each id in each subject. example, new list should show:

    id      sub     score     2       mat     34     3       mat     67     4       mat     22     5       sci     78     6       mat     32     1       mat     56     1       sci     40 

i can find duplicated results through:

results[duplicated(results[, c(1,2)]),] 

how order results , delete lowest scoring ones?

there many ways expected output. 1 option dplyr group 'id', 'sub' columns, top score observation top_n, , if there duplicate rows, use distinct.

library(dplyr) df1 %>%    group_by(id, sub) %>%   top_n(1) %>%    distinct()   id   sub score #   (int) (chr) (int) #1     2   mat    34 #2     3   mat    67 #3     4   mat    22 #4     5   sci    78 #5     6   mat    32 #6     1   mat    56 #7     1   sci    40 

or data.table, convert 'data.frame' 'data.table' (setdt(df1)), grouped 'id', 'sub', order 'score' in descending , subset first row of each group combination (.sd[1l] or head(.sd, 1) can used).

library(data.table)  setdt(df1)[order(-score), .sd[1l] ,.(id, sub)] 

or option unique after order columns select first observation each duplicate.

unique(setdt(df1)[order(id, sub,-score)], = c('id', 'sub')) 

or base r, order columns, , use duplicated remove rows duplicates first 2 columns.

df2 <- df1[with(df1, order(id, sub, -score)),] df2[!duplicated(df2[1:2]),] 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -