r - Simple and efficient way to select non-NA data range in data frames -
suppose have following data frame:
dat <- data.frame(a = c(1:3, na), b = c(letters[1:3], na), c = na) > dat b c 1 1 na 2 2 b na 3 3 c na 4 na <na> na
how select non-na region in efficient way?
this use:
ensurenonnarange <- function(dat) { idx_col <- ! sapply(dat, function(ii) all(is.na(ii))) idx_row <- ! sapply(1:nrow(dat), function(ii) all(is.na(unlist(dat[ii, ])))) dat[idx_row, idx_col] } > ensurenonnarange(dat) b 1 1 2 2 b 3 3 c
as today pointed useful function type.convert
hadn't known before, thought there might exist neet "of-the-shelf" task in base r.
update
some comparisons based on answers/comments got:
ensurenonnarange2 <- function(dat) { dat[rowsums(!is.na(dat)) != 0, colsums(!is.na(dat)) != 0] } microbenchmark::microbenchmark( = ensurenonnarange(dat), b = ensurenonnarange2(dat) ) unit: microseconds expr min lq mean median uq max neval 296.178 310.1070 346.2259 329.0210 349.9875 680.035 100 b 112.313 120.0845 134.1716 125.6555 133.7200 338.112 100
while there may yet built-in function this, can subsetting.
when is.na
passed entire data.frame
, makes boolean mask, if sum rows , columns of !is.na(dat)
(i.e. add true
values of not na
), sums of 0 rows , columns have only na
s.
thus, if subset when our row , column sums != 0
, left rows , columns non-na
values:
> dat[rowsums(!is.na(dat)) != 0, colsums(!is.na(dat)) != 0] b 1 1 2 2 b 3 3 c
if not values in row or column na, approach leaves row/column:
> dat[2,2] <- na > dat[rowsums(!is.na(dat)) != 0, colsums(!is.na(dat)) != 0] b 1 1 2 2 <na> 3 3 c
(if you'd rather ditch rows/columns any na
s, adjust exclamation points, or use complete.cases
.)
further, should pretty super-fast, because rowsums
, colsums
highly optimized, should still work on huge data structures.
Comments
Post a Comment