python - Deleting rows from numpy array not working -
i trying split numpy
array of data points test , training sets. that, i'm randomly selecting rows array use training set , remaining test set.
this code:
matrix = numpy.loadtxt("matrix_vals.data", delimiter=',', dtype=float) matrix_rows, matrix_cols = matrix.shape # training set randvals = numpy.random.randint(matrix_rows, size=50) train = matrix[randvals,:] test = numpy.delete(matrix, randvals, 0) print matrix.shape print train.shape print test.shape
but output is:
matrix.shape: (130, 14) train.shape: (50, 14) test.shape: (89, 14)
this wrong since number of rows train , test should add total number of rows in matrix but here it's more. can me figure out what's going wrong?
because generating random integers with replacement, randvals
contain repeat indices.
indexing repeated indices return same row multiple times, matrix[randvals, :]
guaranteed give output 50 rows, regardless of whether of them repeated.
in contrast, np.delete(matrix, randvals, 0)
remove unique row indices, reduce number of rows number of unique values in randvals
.
try comparing:
print(np.unique(randvals).shape[0] == matrix_rows - test.shape[0]) # true
to generate vector of unique random indices between 0 , 1 - matrix_rows
, use np.random.choice
replace=false
:
uidx = np.random.choice(matrix_rows, size=50, replace=false)
then matrix[uidx].shape[0] + np.delete(matrix, uidx, 0).shape[0] == matrix_rows
.
Comments
Post a Comment