Is this one-hot encoding in TensorFlow fast? Or flawed for any reason? -
there few stack overflow questions computing one-hot embeddings tensorflow, , here accepted solution:
num_labels = 10 sparse_labels = tf.reshape(label_batch, [-1, 1]) derived_size = tf.shape(label_batch)[0] indices = tf.reshape(tf.range(0, derived_size, 1), [-1, 1]) concated = tf.concat(1, [indices, sparse_labels]) outshape = tf.reshape(tf.concat(0, [derived_size, [num_labels]]), [-1]) labels = tf.sparse_to_dense(concated, outshape, 1.0, 0.0)
this identical code in official tutorial: https://www.tensorflow.org/versions/0.6.0/tutorials/mnist/tf/index.html
to me seems since tf.nn.embedding_lookup
exists, it's more efficient. here's version uses this, , supports arbitrarily-shaped inputs:
def one_hot(inputs, num_classes): tf.device('/cpu:0'): table = tf.constant(np.identity(num_classes, dtype=np.float32)) embeddings = tf.nn.embedding_lookup(table, inputs) return embeddings
do expect implementation faster? , flawed other reason?
the one_hot()
function in question looks correct. however, reason not recommend writing code way is very memory inefficient. understand why, let's have batch size of 32, , 1,000,000 classes.
in version suggested in tutorial, largest tensor result of
tf.sparse_to_dense()
,32 x 1000000
.in
one_hot()
function in question, largest tensor result ofnp.identity(1000000)
, 4 terabytes. of course, allocating tensor won't succeed. if number of classes smaller, still waste memory store of zeroes explicitly—tensorflow not automatically convert data sparse representation though might profitable so.
finally, want offer plug new function added open-source repository, , available in next release. tf.nn.sparse_softmax_cross_entropy_with_logits()
allows specify vector of integers labels, , saves having build dense one-hot representation. should more efficient either solution large numbers of classes.
Comments
Post a Comment