python - Reading binary data in float32 -


i want train network using tensorflow, based on features time signal. data split in e 3 second epochs f features each epoch. thus, data has form

epoch | feature 1 | feature 2 | ... | feature f | ------------------------------------------------- 1     | ..        | ..        |     | ..        |       | ..        | ..        |     | ..        | e     | ..        | ..        |     | ..        | 

loading data tensorflow, trying follow cifar example , using tf.fixedlengthrecordreader. thus, have taken data, , saved binary file of type float32 first label first epoch, followed f features first epoch, second, etc.

reading tensorflow challenge me, however. here code:

def read_data_file(file_queue):      class datarecord(object):         pass      result = datarecord()      #1 float32 label => 4 bytes     label_bytes = 4      #num_features float32 => 4 * num_features     features_bytes = 4 * num_features      #create read operator summed amount of bytes     reader = tf.fixedlengthrecordreader(record_bytes=label_bytes+features_bytes)      #perform operation     result.key, value = reader.read(file_queue)      #decode result bytes float32     value_bytes = tf.decode_raw(value, tf.float32, little_endian=true)      #cast label int later     result.label = tf.cast(tf.slice(value_bytes, [0], [label_bytes]), tf.int32)      #cast features float32     result.features = tf.cast(tf.slice(value_bytes, [label_bytes],         [features_bytes]), tf.float32)      print ('>>>>>>>>>>>>>>>>>>>>>>>>>>>')     print ('%s' % result.label)     print ('%s' % result.features)     print ('>>>>>>>>>>>>>>>>>>>>>>>>>>>') 

print output was:

tensor("cast:0", shape=tensorshape([dimension(4)]), dtype=int32) tensor("slice_1:0", shape=tensorshape([dimension(40)]), dtype=float32) 

which surprises me, because since have cast values float32, expected dimensions respectively 1 , 10, actual numbers, 4 , 40, corresponds byte lengths.

how come?

i think issue stems fact tf.decode_raw(value, tf.float32, little_endian=true) returns vector of type tf.float32 rather vector of bytes. slice size extracting features should specified count of floating-point values (i.e. num_features) rather count of bytes (features_bytes).

however, there's slight wrinkle label integer, while rest of vector contains floating-point values. tensorflow doesn't have many facilities casting between binary representations (except tf.decode_raw()), you'll have decode string twice different types:

# decode result bytes int32 value_as_ints = tf.decode_raw(value, tf.int32, little_endian=true) result.label = value_as_ints[0]  # decode result bytes float32 value_as_floats = tf.decode_raw(value, tf.float32, little_endian=true) result.features = value_as_floats[1:1+num_features] 

note works because sizeof(tf.int32) == sizeof(tf.float32), wouldn't true in general. more string manipulation tools useful slicing out appropriate substrings of raw value in more general case. should enough going, though.


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -