Comments (10)
I was able to get a ~60% improvement by making as few numpy
calls as possible in the loop. These changes are at https://github.com/nkeim/trackpy/tree/numba-refine
I think a next step is to carefully control the types passed to _numba_refine
. That's worked for me in the past.
Before (2.5-year-old MacBook Air)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.27 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 297 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 1.1 s per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 262 ms per loop
After (commit c2d3d4c)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.08 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 296 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 372 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
10 loops, best of 3: 181 ms per loop
from trackpy.
Impressive. I'll pull your commits into an actual PR and continue this....
On Mon, Feb 3, 2014 at 8:49 AM, Nathan Keim [email protected]:
I was able to get a ~60% improvement by making as few numpy calls as
possible in the loop. These changes are at
https://github.com/nkeim/trackpy/tree/numba-refineI think a next step is to carefully control the types passed to
_numba_refine. That's worked for me in the past.Before (2.5-year-old MacBook Air)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.27 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 297 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 1.1 s per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 262 ms per loopAfter (commit c2d3d4chttps://github.com/soft-matter/trackpy/commit/c2d3d4c227aab3373740216148ddda61d53ba9d3
)
Locate using Python Engine with Default Settings (Accurate)
1 loops, best of 3: 1.08 s per loop
Locate using Python Engine with Fast Settings (Sloppy)
1 loops, best of 3: 296 ms per loop
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 372 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
10 loops, best of 3: 181 ms per loopReply to this email directly or view it on GitHubhttps://github.com//issues/45#issuecomment-33955339
.
from trackpy.
See #49
from trackpy.
Great! I made a few more tweaks in my repo which shave ~10 ms off the precise benchmark; they can go into the next PR.
from trackpy.
I added your latest commits.
I'll merge this to make it available. Subsequent improvements will not affect the API, so there's no harm in doing this in stages. We can open a new PR when one of us improves it further.
I tried explicitly casting types. I didn't help substantially, but it didn't hurt, and it seems safer, so I'm committing it.
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 416 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 235 ms per loop
after 761c70d
Locate using Numba Engine with Default Settings (Accurate)
1 loops, best of 3: 411 ms per loop
Locate using Numba Engine with Fast Settings (Sloppy)
1 loops, best of 3: 233 ms per loop
from trackpy.
OK, I've put up some more optimizations, as well as a new benchmark numba_benchmarks.ipy
that runs on a larger image. But there's a problem: somewhere along the way, I didn't realize that refine()
(specifically the numba
portion) is now a rounding error in the locate
computation time. I scaled up the test image by a factor of 10, and refine()
(accurate settings) was using 59 ms; with the speedups in my latest commit, it's down to 47 ms. Instead, almost all of the computation is spent in ndimage
. prun
output is below.
So let's declare victory on refine
! The next target should be estimate_mass()
.
nathan
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.188 1.188 1.188 1.188 {scipy.ndimage._nd_image.generic_filter}
2 0.512 0.256 0.512 0.256 {numpy.fft.fftpack_lite.cfftf}
2 0.498 0.249 0.498 0.249 {numpy.fft.fftpack_lite.cfftb}
1 0.409 0.409 0.409 0.409 {scipy.ndimage._nd_image.min_or_max_filter}
6533 0.270 0.000 0.444 0.000 feature.py:90(estimate_mass)
4 0.221 0.055 0.221 0.055 {numpy.core.multiarray.where}
1 0.218 0.218 0.218 0.218 {scipy.ndimage._nd_image.binary_erosion}
2 0.152 0.076 0.673 0.337 fftpack.py:167(ifft)
2 0.112 0.056 0.112 0.056 {scipy.ndimage._nd_image.uniform_filter1d}
6542 0.096 0.000 0.096 0.000 {method 'reduce' of 'numpy.ufunc' objects}
1 0.083 0.083 1.856 1.856 feature.py:39(local_maxima)
1 0.075 0.075 1.583 1.583 preprocessing.py:32(bandpass)
1 0.050 0.050 0.050 0.050 {scipy.ndimage._nd_image.fourier_filter}
1 0.047 0.047 1.882 1.882 uncertainty.py:26(measure_noise)
1 0.046 0.046 4.337 4.337 feature.py:434(locate)
1 0.046 0.046 0.047 0.047 feature.py:114(refine)
2 0.042 0.021 0.042 0.021 {method 'nonzero' of 'numpy.ndarray' objects}
1 0.030 0.030 0.030 0.030 {method 'sort' of 'numpy.ndarray' objects}
6 0.027 0.005 0.027 0.005 {method 'astype' of 'numpy.ndarray' objects}
1 0.020 0.020 0.043 0.043 preprocessing.py:46(scale_to_gamut)
2 0.020 0.010 1.205 0.602 fftpack.py:518(_raw_fftnd)
6533 0.018 0.000 0.126 0.000 fromnumeric.py:1422(sum)
1 0.017 0.017 0.017 0.017 {method 'clip' of 'numpy.ndarray' objects}
13399 0.016 0.000 0.044 0.000 {isinstance}
2 0.015 0.007 0.015 0.007 {method 'copy' of 'numpy.ndarray' objects}
6540 0.015 0.000 0.028 0.000 abc.py:128(__instancecheck__)
8 0.015 0.002 0.015 0.002 {numpy.core.multiarray.zeros}
1 0.014 0.014 1.816 1.816 uncertainty.py:7(roi)
6534 0.013 0.000 0.100 0.000 _methods.py:16(_sum)
6540 0.013 0.000 0.048 0.000 utils.py:58(__call__)
1 0.011 0.011 0.015 0.015 _methods.py:60(_var)
6540 0.010 0.000 0.010 0.000 _weakrefset.py:68(__contains__)
1 0.008 0.008 4.345 4.345 <string>:1(<module>)
80 0.007 0.000 0.007 0.000 {numpy.core.multiarray.array}
6570 0.003 0.000 0.003 0.000 {getattr}
from trackpy.
Great work all around. We'll have refine-victory beers when I come to UPenn.
The exact same strategy should work on estimate_mass
. I attempted this early on and ran into trouble using the result as a mask, but I think it's a trivial problem.
from trackpy.
Yay!! Thanks for doing the actually hard part of bifurcating refine()
and adding the explicit loops. Glad I could help.
I think the overall lesson I got from this is: numba
really is fast, but it's dumb as to how it handles numpy
calls and conventions. But it is smarter than it was last year, when there was way more trial and error in avoiding casts and getting the biggest performance gains.
Also, the benchmarks we've been using have a relatively low ratio of particles to pixels, and so I think they understate the performance gain when it comes to packings like mine. That's where our new code will really shine.
from trackpy.
One last thing: I just discovered this:
http://numba.pydata.org/numba-doc/dev/annotate.html
It can take a little work to get your source to run with the numba
command, but the annotated output points directly to what is slowing down your numba code (i.e. falling back to the Python C API). Let's remember to use this in the future.
from trackpy.
Neat. Noted.
On Mon, Feb 3, 2014 at 8:15 PM, Nathan Keim [email protected]:
One last thing: I just discovered this:
http://numba.pydata.org/numba-doc/dev/annotate.html
It can take a little work to get your source to run with the numbacommand, but the annotated output points directly to what is slowing down
your numba code (i.e. falling back to the Python C API). Let's remember to
use this in the future.Reply to this email directly or view it on GitHubhttps://github.com//issues/45#issuecomment-34021023
.
from trackpy.
Related Issues (20)
- object has no attribute 'long_name' (when inspecting 'frame') HOT 2
- tp.emsd TypeError: mean() got an unexpected keyword argument 'level' HOT 7
- Working on a napari-trackpy plugin HOT 1
- TST: Weird dtype switching on Windows HOT 2
- link_df_iter giving TypeError when using pd.concat?
- level keyword argument was removed in pandas>2.0 HOT 1
- Release v0.6.2 HOT 5
- Tutorial shows problem using latest pandas HOT 2
- Linking using multiple HDF5 files. HOT 1
- Use pims.as_grey instead of image[:, :, 1] in walkthrough tutorial.
- Old Python version in installation instructions
- TST: Need to replace a GitHub action
- tp.batch does not perform batch correctly (all frames output as 0) HOT 3
- Error in trackpy.emsd() function: TypeError: mean() got an unexpected keyword argument 'level' HOT 5
- `plot_traj3d` throws error
- Locate only within specific ROI HOT 2
- [Question] methods for initial estimation of `minmass` HOT 1
- [Enhancement] Allowing possibility to convert raw image to `uint16`
- [Question] How to add optional arguments in after_locate function in tp.batch HOT 1
- Release v0.6.3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from trackpy.