python - Reading binary data in float32 -
i want train network using tensorflow, based on features time signal. data split in e
3 second epochs f
features each epoch. thus, data has form
epoch | feature 1 | feature 2 | ... | feature f | ------------------------------------------------- 1 | .. | .. | | .. | | .. | .. | | .. | e | .. | .. | | .. |
loading data tensorflow, trying follow cifar example , using tf.fixedlengthrecordreader
. thus, have taken data, , saved binary file of type float32
first label first epoch, followed f
features first epoch, second, etc.
reading tensorflow challenge me, however. here code:
def read_data_file(file_queue): class datarecord(object): pass result = datarecord() #1 float32 label => 4 bytes label_bytes = 4 #num_features float32 => 4 * num_features features_bytes = 4 * num_features #create read operator summed amount of bytes reader = tf.fixedlengthrecordreader(record_bytes=label_bytes+features_bytes) #perform operation result.key, value = reader.read(file_queue) #decode result bytes float32 value_bytes = tf.decode_raw(value, tf.float32, little_endian=true) #cast label int later result.label = tf.cast(tf.slice(value_bytes, [0], [label_bytes]), tf.int32) #cast features float32 result.features = tf.cast(tf.slice(value_bytes, [label_bytes], [features_bytes]), tf.float32) print ('>>>>>>>>>>>>>>>>>>>>>>>>>>>') print ('%s' % result.label) print ('%s' % result.features) print ('>>>>>>>>>>>>>>>>>>>>>>>>>>>')
print output was:
tensor("cast:0", shape=tensorshape([dimension(4)]), dtype=int32) tensor("slice_1:0", shape=tensorshape([dimension(40)]), dtype=float32)
which surprises me, because since have cast values float32, expected dimensions respectively 1 , 10, actual numbers, 4 , 40, corresponds byte lengths.
how come?
i think issue stems fact tf.decode_raw(value, tf.float32, little_endian=true)
returns vector of type tf.float32
rather vector of bytes. slice size extracting features should specified count of floating-point values (i.e. num_features
) rather count of bytes (features_bytes
).
however, there's slight wrinkle label integer, while rest of vector contains floating-point values. tensorflow doesn't have many facilities casting between binary representations (except tf.decode_raw()
), you'll have decode string twice different types:
# decode result bytes int32 value_as_ints = tf.decode_raw(value, tf.int32, little_endian=true) result.label = value_as_ints[0] # decode result bytes float32 value_as_floats = tf.decode_raw(value, tf.float32, little_endian=true) result.features = value_as_floats[1:1+num_features]
note works because sizeof(tf.int32) == sizeof(tf.float32)
, wouldn't true in general. more string manipulation tools useful slicing out appropriate substrings of raw value
in more general case. should enough going, though.
Comments
Post a Comment