<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

See <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id

OK, I've put up some more optimizations, as well as a new benchmark <code class="notra

Yay!! Thanks for doing the actually hard part of bifurcating <code class="notranslate"

One last thing: I just discovered this: <a href="http://numba.pydata

Neat. Noted. On Mon, Feb 3, 2014 at 8:15 PM, Nathan Keim <a href="ma

numba help about trackpy HOT 10 CLOSED

soft-matter commented on July 28, 2024

numba help

from trackpy.

Comments (10)

nkeim commented on July 28, 2024

I was able to get a ~60% improvement by making as few numpy calls as possible in the loop. These changes are at https://github.com/nkeim/trackpy/tree/numba-refine

I think a next step is to carefully control the types passed to _numba_refine. That's worked for me in the past.

Before (2.5-year-old MacBook Air)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.27 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 297 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 1.1 s per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 262 ms per loop

After (commit c2d3d4c)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.08 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 296 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 372 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
10 loops, best of 3: 181 ms per loop

from trackpy.

danielballan commented on July 28, 2024

Impressive. I'll pull your commits into an actual PR and continue this....

On Mon, Feb 3, 2014 at 8:49 AM, Nathan Keim [email protected]:

I was able to get a ~60% improvement by making as few numpy calls as
possible in the loop. These changes are at
https://github.com/nkeim/trackpy/tree/numba-refine

I think a next step is to carefully control the types passed to
_numba_refine. That's worked for me in the past.

Before (2.5-year-old MacBook Air)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.27 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 297 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 1.1 s per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 262 ms per loop

After (commit c2d3d4chttps://github.com/soft-matter/trackpy/commit/c2d3d4c227aab3373740216148ddda61d53ba9d3
)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.08 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 296 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 372 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
10 loops, best of 3: 181 ms per loop

Reply to this email directly or view it on GitHubhttps://github.com//issues/45#issuecomment-33955339
.

from trackpy.

danielballan commented on July 28, 2024

See #49

from trackpy.

nkeim commented on July 28, 2024

Great! I made a few more tweaks in my repo which shave ~10 ms off the precise benchmark; they can go into the next PR.

from trackpy.

danielballan commented on July 28, 2024

I added your latest commits.

I'll merge this to make it available. Subsequent improvements will not affect the API, so there's no harm in doing this in stages. We can open a new PR when one of us improves it further.

I tried explicitly casting types. I didn't help substantially, but it didn't hurt, and it seems safer, so I'm committing it.

Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 416 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 235 ms per loop

after 761c70d

Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 411 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 233 ms per loop

from trackpy.

nkeim commented on July 28, 2024

OK, I've put up some more optimizations, as well as a new benchmark numba_benchmarks.ipy that runs on a larger image. But there's a problem: somewhere along the way, I didn't realize that refine() (specifically the numba portion) is now a rounding error in the locate computation time. I scaled up the test image by a factor of 10, and refine() (accurate settings) was using 59 ms; with the speedups in my latest commit, it's down to 47 ms. Instead, almost all of the computation is spent in ndimage. prun output is below.

So let's declare victory on refine! The next target should be estimate_mass().

nathan

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.188    1.188    1.188    1.188 {scipy.ndimage._nd_image.generic_filter}
        2    0.512    0.256    0.512    0.256 {numpy.fft.fftpack_lite.cfftf}
        2    0.498    0.249    0.498    0.249 {numpy.fft.fftpack_lite.cfftb}
        1    0.409    0.409    0.409    0.409 {scipy.ndimage._nd_image.min_or_max_filter}
     6533    0.270    0.000    0.444    0.000 feature.py:90(estimate_mass)
        4    0.221    0.055    0.221    0.055 {numpy.core.multiarray.where}
        1    0.218    0.218    0.218    0.218 {scipy.ndimage._nd_image.binary_erosion}
        2    0.152    0.076    0.673    0.337 fftpack.py:167(ifft)
        2    0.112    0.056    0.112    0.056 {scipy.ndimage._nd_image.uniform_filter1d}
     6542    0.096    0.000    0.096    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.083    0.083    1.856    1.856 feature.py:39(local_maxima)
        1    0.075    0.075    1.583    1.583 preprocessing.py:32(bandpass)
        1    0.050    0.050    0.050    0.050 {scipy.ndimage._nd_image.fourier_filter}
        1    0.047    0.047    1.882    1.882 uncertainty.py:26(measure_noise)
        1    0.046    0.046    4.337    4.337 feature.py:434(locate)
        1    0.046    0.046    0.047    0.047 feature.py:114(refine)
        2    0.042    0.021    0.042    0.021 {method 'nonzero' of 'numpy.ndarray' objects}
        1    0.030    0.030    0.030    0.030 {method 'sort' of 'numpy.ndarray' objects}
        6    0.027    0.005    0.027    0.005 {method 'astype' of 'numpy.ndarray' objects}
        1    0.020    0.020    0.043    0.043 preprocessing.py:46(scale_to_gamut)
        2    0.020    0.010    1.205    0.602 fftpack.py:518(_raw_fftnd)
     6533    0.018    0.000    0.126    0.000 fromnumeric.py:1422(sum)
        1    0.017    0.017    0.017    0.017 {method 'clip' of 'numpy.ndarray' objects}
    13399    0.016    0.000    0.044    0.000 {isinstance}
        2    0.015    0.007    0.015    0.007 {method 'copy' of 'numpy.ndarray' objects}
     6540    0.015    0.000    0.028    0.000 abc.py:128(__instancecheck__)
        8    0.015    0.002    0.015    0.002 {numpy.core.multiarray.zeros}
        1    0.014    0.014    1.816    1.816 uncertainty.py:7(roi)
     6534    0.013    0.000    0.100    0.000 _methods.py:16(_sum)
     6540    0.013    0.000    0.048    0.000 utils.py:58(__call__)
        1    0.011    0.011    0.015    0.015 _methods.py:60(_var)
     6540    0.010    0.000    0.010    0.000 _weakrefset.py:68(__contains__)
        1    0.008    0.008    4.345    4.345 <string>:1(<module>)
       80    0.007    0.000    0.007    0.000 {numpy.core.multiarray.array}
     6570    0.003    0.000    0.003    0.000 {getattr}

from trackpy.

danielballan commented on July 28, 2024

Great work all around. We'll have refine-victory beers when I come to UPenn.

The exact same strategy should work on estimate_mass. I attempted this early on and ran into trouble using the result as a mask, but I think it's a trivial problem.

from trackpy.

nkeim commented on July 28, 2024

Yay!! Thanks for doing the actually hard part of bifurcating refine() and adding the explicit loops. Glad I could help.

I think the overall lesson I got from this is: numba really is fast, but it's dumb as to how it handles numpy calls and conventions. But it is smarter than it was last year, when there was way more trial and error in avoiding casts and getting the biggest performance gains.

Also, the benchmarks we've been using have a relatively low ratio of particles to pixels, and so I think they understate the performance gain when it comes to packings like mine. That's where our new code will really shine.

from trackpy.

nkeim commented on July 28, 2024

One last thing: I just discovered this:

http://numba.pydata.org/numba-doc/dev/annotate.html

It can take a little work to get your source to run with the numba command, but the annotated output points directly to what is slowing down your numba code (i.e. falling back to the Python C API). Let's remember to use this in the future.

from trackpy.

danielballan commented on July 28, 2024

Neat. Noted.

On Mon, Feb 3, 2014 at 8:15 PM, Nathan Keim [email protected]:

One last thing: I just discovered this:

http://numba.pydata.org/numba-doc/dev/annotate.html

It can take a little work to get your source to run with the numbacommand, but the annotated output points directly to what is slowing down
your numba code (i.e. falling back to the Python C API). Let's remember to
use this in the future.

Reply to this email directly or view it on GitHubhttps://github.com//issues/45#issuecomment-34021023
.

from trackpy.

numba help about trackpy HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent