Why is a `for` over a Python list faster than over a Numpy array? -
so without telling long story working on code reading in data binary file , looping on every single point using loop. completed code , running ridiculously slow. looping on around 60,000 points around 128 data channels , taking minute or more process. way slower ever expected python run. made whole thing more efficient using numpy in trying figure out why original process ran slow doing type checking , found looping on numpy arrays instead of python lists. ok no major deal make inputs our test setup same converted numpy arrays lists before looping. bang same slow code took minute run took 10 seconds. floored. think did change numpy array python list changed , slow mud again. couldn't believe went more definitive proof
$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k+1" 100 loops, best of 3: 5.46 msec per loop $ python -m timeit "for k in range(5000): k+1" 1000 loops, best of 3: 256 usec per loop
what going on? know numpy arrays , and python list different why slower iterate on every point in array?
i observed behavior in both python 2.6 , 2.7 running numpy 10.1 believe.
we can little sleuthing figure out:
>>> import numpy np >>> = np.arange(32) >>> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]) >>> a.data <read-write buffer 0x107d01e40, size 256, offset 0 @ 0x107d199b0> >>> id(a.data) 4433424176 >>> id(a[0]) 4424950096 >>> id(a[1]) 4424950096 >>> item in a: ... print id(item) ... 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120 4424950096 4424950120
so going on here? first, took @ memory location of array's memory buffer. it's @ 4433424176
. in isn't too illuminating. however, numpy stores it's data contiguous c array, first element in numpy array should correspond memory address of array itself, doesn't:
>>> id(a[0]) 4424950096
and it's thing doesn't because break invariant in python 2 objects never have same id
during lifetimes.
so, how numpy accomplish this? well, answer numpy has wrap returned object python type (e.g. numpy.float64
or numpy.int64
in case) takes time if you're iterating item-by-item1. further proof of demonstrated when iterating -- see we're alternating between 2 separate ids while iterating on array. means python's memory allocator , garbage collector working overtime create new objects , free them.
a list doesn't have memory allocator/garbage collector overhead. objects in list exist python objects (and they'll still exist after iteration), neither plays role in iteration on list.
timing methodology:
also note, timings thrown off little bit assumptions. assuming k + 1
should take same amount of time in both cases, doesn't. notice if repeat timings without doing addition:
mgilson$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k" 1000 loops, best of 3: 233 usec per loop mgilson$ python -m timeit "for k in range(5000): k" 10000 loops, best of 3: 114 usec per loop
there's factor of 2 difference. doing addition leads factor of 5 difference or so:
mgilson$ python -m timeit "for k in range(5000): k+1" 10000 loops, best of 3: 179 usec per loop mgilson$ python -m timeit -s "import numpy" "for k in numpy.arange(5000): k+1" 1000 loops, best of 3: 786 usec per loop
for fun, lets addition:
$ python -m timeit -s "v = 1" "v + 1" 10000000 loops, best of 3: 0.0261 usec per loop mgilson$ python -m timeit -s "import numpy; v = numpy.int64(1)" "v + 1" 10000000 loops, best of 3: 0.121 usec per loop
and finally, timeit includes list/array construction time isn't ideal:
mgilson$ python -m timeit -s "v = range(5000)" "for k in v: k" 10000 loops, best of 3: 80.2 usec per loop mgilson$ python -m timeit -s "import numpy; v = numpy.arange(5000)" "for k in v: k" 1000 loops, best of 3: 237 usec per loop
notice numpy got further away list solution in case. shows iteration is slower , might speedups if convert numpy types standard python types.
1note, doesn't take lot of time when slicing because has allocate o(1) new objects since numpy returns view original array.
Comments
Post a Comment