Giter VIP home page Giter VIP logo

Comments (10)

nkeim avatar nkeim commented on July 28, 2024

I was able to get a ~60% improvement by making as few numpy calls as possible in the loop. These changes are at https://github.com/nkeim/trackpy/tree/numba-refine

I think a next step is to carefully control the types passed to _numba_refine. That's worked for me in the past.

Before (2.5-year-old MacBook Air)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.27 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 297 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 1.1 s per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 262 ms per loop

After (commit c2d3d4c)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.08 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 296 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 372 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
10 loops, best of 3: 181 ms per loop

from trackpy.

danielballan avatar danielballan commented on July 28, 2024

Impressive. I'll pull your commits into an actual PR and continue this....

On Mon, Feb 3, 2014 at 8:49 AM, Nathan Keim [email protected]:

I was able to get a ~60% improvement by making as few numpy calls as
possible in the loop. These changes are at
https://github.com/nkeim/trackpy/tree/numba-refine

I think a next step is to carefully control the types passed to
_numba_refine. That's worked for me in the past.

Before (2.5-year-old MacBook Air)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.27 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 297 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 1.1 s per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 262 ms per loop

After (commit c2d3d4chttps://github.com/soft-matter/trackpy/commit/c2d3d4c227aab3373740216148ddda61d53ba9d3
)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.08 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 296 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 372 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
10 loops, best of 3: 181 ms per loop

Reply to this email directly or view it on GitHubhttps://github.com//issues/45#issuecomment-33955339
.

from trackpy.

danielballan avatar danielballan commented on July 28, 2024

See #49

from trackpy.

nkeim avatar nkeim commented on July 28, 2024

Great! I made a few more tweaks in my repo which shave ~10 ms off the precise benchmark; they can go into the next PR.

from trackpy.

danielballan avatar danielballan commented on July 28, 2024

I added your latest commits.

I'll merge this to make it available. Subsequent improvements will not affect the API, so there's no harm in doing this in stages. We can open a new PR when one of us improves it further.

I tried explicitly casting types. I didn't help substantially, but it didn't hurt, and it seems safer, so I'm committing it.

Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 416 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 235 ms per loop

after 761c70d

Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 411 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 233 ms per loop

from trackpy.

nkeim avatar nkeim commented on July 28, 2024

OK, I've put up some more optimizations, as well as a new benchmark numba_benchmarks.ipy that runs on a larger image. But there's a problem: somewhere along the way, I didn't realize that refine() (specifically the numba portion) is now a rounding error in the locate computation time. I scaled up the test image by a factor of 10, and refine() (accurate settings) was using 59 ms; with the speedups in my latest commit, it's down to 47 ms. Instead, almost all of the computation is spent in ndimage. prun output is below.

So let's declare victory on refine! The next target should be estimate_mass().

nathan

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.188    1.188    1.188    1.188 {scipy.ndimage._nd_image.generic_filter}
        2    0.512    0.256    0.512    0.256 {numpy.fft.fftpack_lite.cfftf}
        2    0.498    0.249    0.498    0.249 {numpy.fft.fftpack_lite.cfftb}
        1    0.409    0.409    0.409    0.409 {scipy.ndimage._nd_image.min_or_max_filter}
     6533    0.270    0.000    0.444    0.000 feature.py:90(estimate_mass)
        4    0.221    0.055    0.221    0.055 {numpy.core.multiarray.where}
        1    0.218    0.218    0.218    0.218 {scipy.ndimage._nd_image.binary_erosion}
        2    0.152    0.076    0.673    0.337 fftpack.py:167(ifft)
        2    0.112    0.056    0.112    0.056 {scipy.ndimage._nd_image.uniform_filter1d}
     6542    0.096    0.000    0.096    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.083    0.083    1.856    1.856 feature.py:39(local_maxima)
        1    0.075    0.075    1.583    1.583 preprocessing.py:32(bandpass)
        1    0.050    0.050    0.050    0.050 {scipy.ndimage._nd_image.fourier_filter}
        1    0.047    0.047    1.882    1.882 uncertainty.py:26(measure_noise)
        1    0.046    0.046    4.337    4.337 feature.py:434(locate)
        1    0.046    0.046    0.047    0.047 feature.py:114(refine)
        2    0.042    0.021    0.042    0.021 {method 'nonzero' of 'numpy.ndarray' objects}
        1    0.030    0.030    0.030    0.030 {method 'sort' of 'numpy.ndarray' objects}
        6    0.027    0.005    0.027    0.005 {method 'astype' of 'numpy.ndarray' objects}
        1    0.020    0.020    0.043    0.043 preprocessing.py:46(scale_to_gamut)
        2    0.020    0.010    1.205    0.602 fftpack.py:518(_raw_fftnd)
     6533    0.018    0.000    0.126    0.000 fromnumeric.py:1422(sum)
        1    0.017    0.017    0.017    0.017 {method 'clip' of 'numpy.ndarray' objects}
    13399    0.016    0.000    0.044    0.000 {isinstance}
        2    0.015    0.007    0.015    0.007 {method 'copy' of 'numpy.ndarray' objects}
     6540    0.015    0.000    0.028    0.000 abc.py:128(__instancecheck__)
        8    0.015    0.002    0.015    0.002 {numpy.core.multiarray.zeros}
        1    0.014    0.014    1.816    1.816 uncertainty.py:7(roi)
     6534    0.013    0.000    0.100    0.000 _methods.py:16(_sum)
     6540    0.013    0.000    0.048    0.000 utils.py:58(__call__)
        1    0.011    0.011    0.015    0.015 _methods.py:60(_var)
     6540    0.010    0.000    0.010    0.000 _weakrefset.py:68(__contains__)
        1    0.008    0.008    4.345    4.345 <string>:1(<module>)
       80    0.007    0.000    0.007    0.000 {numpy.core.multiarray.array}
     6570    0.003    0.000    0.003    0.000 {getattr}

from trackpy.

danielballan avatar danielballan commented on July 28, 2024

Great work all around. We'll have refine-victory beers when I come to UPenn.

The exact same strategy should work on estimate_mass. I attempted this early on and ran into trouble using the result as a mask, but I think it's a trivial problem.

from trackpy.

nkeim avatar nkeim commented on July 28, 2024

Yay!! Thanks for doing the actually hard part of bifurcating refine() and adding the explicit loops. Glad I could help.

I think the overall lesson I got from this is: numba really is fast, but it's dumb as to how it handles numpy calls and conventions. But it is smarter than it was last year, when there was way more trial and error in avoiding casts and getting the biggest performance gains.

Also, the benchmarks we've been using have a relatively low ratio of particles to pixels, and so I think they understate the performance gain when it comes to packings like mine. That's where our new code will really shine.

from trackpy.

nkeim avatar nkeim commented on July 28, 2024

One last thing: I just discovered this:

http://numba.pydata.org/numba-doc/dev/annotate.html

It can take a little work to get your source to run with the numba command, but the annotated output points directly to what is slowing down your numba code (i.e. falling back to the Python C API). Let's remember to use this in the future.

from trackpy.

danielballan avatar danielballan commented on July 28, 2024

Neat. Noted.

On Mon, Feb 3, 2014 at 8:15 PM, Nathan Keim [email protected]:

One last thing: I just discovered this:

http://numba.pydata.org/numba-doc/dev/annotate.html

It can take a little work to get your source to run with the numbacommand, but the annotated output points directly to what is slowing down
your numba code (i.e. falling back to the Python C API). Let's remember to
use this in the future.

Reply to this email directly or view it on GitHubhttps://github.com//issues/45#issuecomment-34021023
.

from trackpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.