Comments (11)
Instead of going through the CLI, a direct comparison might be more illustrative of the performance difference:
with rasterio.open(srcpath) as src:
arr = src.read().astype('float64') / 255.0
assert arr.shape == (3, 4096, 4096)
start = time.time()
res = skimage_saturation(arr, 120)
print("Scikit-image based saturation: {} seconds".format(time.time() - start))
start = time.time()
res2 = cython_saturation(arr, 120)
print("Cython saturation: {} seconds".format(time.time() - start))
Saturation on a 4096x4096 array gives us
Scikit-image based saturation: 20.527370929718018 seconds
Cython saturation: 12.189836978912354 seconds
So the Cython is about 68% faster for a large array. Modest but significant.
from rio-color.
I added a noop
operator just to test the raw IO speed. Some interesting data points...
$ time rio color $orig /tmp/time.tif "noop r"
real 0m10.501s
$ time rio color -j 8 $orig /tmp/time.tif "noop r"
real 0m7.605s
So even with 8 cores, the raw IO speed of rio color can't beat the modulate
operation from imagemagick. 😮
The raw io/noop imagemagick convert takes 2.4s
from rio-color.
Imagemagick reads everything into memory. Rio color reads tile-by-tile.The vast majority of the time in the rio color raw IO case is spent reading and writing blocks. OTOH rio color keeps a low and stable memory profile. It's a tradeoff.
So we have two issues here
Optimization of IO vs memory
We could probably adjust the tradeoff by reading in multiple blocks at a time? See rasterio/rasterio#672
Optimizing conversion of colorspace for saturation
Profiling confirms that the colorspace conversion is the culprit
547020 function calls (535179 primitive calls) in 61.707 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
...
64 0.155 0.002 27.470 0.429 /Users/mperry/work/rio-color/rio_color/operations.py:60(rgb2lch)
64 0.138 0.002 25.391 0.397 /Users/mperry/work/rio-color/rio_color/operations.py:70(lch2rgb)
52 seconds out of 61 total were spent swapping color spaces for this 64 x (3 x 1024 x 1024) block tif.
from rio-color.
Prior art: other python implementations of RGB->LCH conversion.
skimage (the benchmark)
We already use the skimage.color which works on arrays of shape (..., ..., 3)
with values scaled 0..1. Doing the conversion requires going through the LAB color space. So far this is the benchmark.
colormath 👎
The colormath module is excellent for exploring and validating color conversions on single values. Not so great at optimizing the calculation on large arrays since it uses python objects. It's really designed for usage like:
from colormath.color_objects import LCHabColor
from colormath.color_objects import sRGBColor
rgb = sRGBColor(0.5, 0.8, 0.1)
convert_color(rgb, LCHabColor)
# LCHabColor(lch_l=74.70319882217456,lch_c=85.60523822265391,lch_h=124.3276180394683)
Converting that single value takes about 59.2 µs. For the largish raster above (8192 x 8192) that would take an hour assuming we could vectorize it effectively. So performance is just not even in the right ballpark. Still a useful library for understanding colorspaces though!
Other options
- Dig into the skimage implementation and try to optimize
- Implement our own in cython or vectorized numpy ops. Definitely possible but would take some time.
- Find a good RGB<->LAB conversion and take it to/from LAB->LCH-> adjust saturation->LAB ourselves
- opencv
- PIL: http://pillow.readthedocs.org/en/3.1.x/reference/ImageCms.html
- http://stackoverflow.com/questions/7530627/hcl-color-to-rgb-and-backward
- http://stackoverflow.com/a/16020102/519385
I'll update this ticket as we explore other avenues...
Overall, I'm surprised at the lack of performance-oriented, array-based colorspace libraries in the Python world.
It does seem like the RGB->LCH conversion is inherently more costly than simpler conversions like RGB->HSL. The sklearn and colormath LCH implementations are both roughly 4x slower than HSL. If so we might have to live with a performance penalty. But I'm convinced we can do better than skimage but it depends on how much time we want to spend in R&D.
from rio-color.
Using this script confirms the timings on my machine
For a (1024, 1024, 3)
to round-trip through rgb->lch->rgb takes ~0.9 seconds
<function rgb2lab at 0x10a5aa400> 0.3880 seconds
<function lab2lch at 0x10a5aa950> 0.0730 seconds
<function lch2lab at 0x10a5aaa60> 0.0604 seconds
<function lab2rgb at 0x10a5aa488> 0.3852 seconds
And profiling indicates that most of that time is spend going through the XYZ color space
115723 function calls (111899 primitive calls) in 3.108 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
3 0.435 0.145 0.485 0.162 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:852(lab2xyz)
3 0.420 0.140 0.450 0.150 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:784(xyz2lab)
3 0.416 0.139 0.696 0.232 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:561(rgb2xyz)
3 0.397 0.132 0.644 0.215 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:515(xyz2rgb)
2241 0.235 0.000 0.235 0.000 {built-in method array}
7 0.209 0.030 0.209 0.030 {method 'reshape' of 'numpy.ndarray' objects}
3 0.163 0.054 0.181 0.060 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:1400(_cart2polar_2pi)
3 0.159 0.053 0.177 0.059 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:1410(lch2lab)
12 0.076 0.006 0.076 0.006 {method 'copy' of 'numpy.ndarray' objects}
27/24 0.065 0.002 0.068 0.003 {built-in method load_dynamic}
6 0.062 0.010 0.062 0.010 {built-in method concatenate}
8 0.060 0.007 0.060 0.007 {built-in method dot}
1 0.045 0.045 0.045 0.045 {method 'random_sample' of 'mtrand.RandomState' objects}
2 0.036 0.018 0.051 0.026 /Users/mperry/env/mapbox34/lib/python3.4/site-packages/numpy/core/numeric.py:2434(within_tol)
3 0.029 0.010 1.174 0.391 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:917(rgb2lab)
3 0.025 0.008 1.154 0.385 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:944(lab2rgb)
from rio-color.
I think it might be realistic to optimize this ourselves.
Using skimage
, we’re doing RGB to XYZ to Lab to LCH and back again. Now, everything in that pipeline is ufuncs, and for example Lab↔LCH is basically only Cartesian↔polar. (I say “only”, which is a bit unfair – hypot
and arctan2
will be noticeably slower than, say, +
and *
on big arrays. But my point is that there isn’t a colossal pillar of mysterious calculus in there or anything.)
My hunch is that skimage
’s speed hit is mostly from the overheads of nested function calls and reindexes/assignments that hinder transparent optimization. I suspect that the actual math is not that scary, and probably tractable by algebraic simplification down to a couple lines. That’s just a hunch, so check me.
What we lose is the conceptual clarity of the chain of explicit conversions between well-defined colorspaces. I see that as an acceptable though significant cost, but I could be persuaded.
from rio-color.
lose conceptual clarity
not too worried about that either, since we have existing libraries for that purpose. The main goal is to translate/refactor the maths into highly efficient array operations.
Blue sky goal: a Cython function that worked directly on RGB arrays (as read by rasterio). Pure C-speed math, no intermediate numpy representations. Not difficult conceptually but the 👿 is in the details.
from rio-color.
Cython function that worked directly on RGB arrays. Pure C-speed math, no intermediate numpy representations.
Would love to help out with this @perrygeo 👿
from rio-color.
In the new rio-color@colorspaces
branch, I added two functions to do direct conversions to and from LCH and RGB color space.
The eventual plan is to port them to Cython or C, and vectorize them for use with numpy arrays. But for now, let's work with pure functions in Python to make sure we get the math right.
@virginiayung, I could use your 👀 and 💭 on the above functions. I mainly followed the wikipedia pages for sRGB and LAB in order to translate the formulas to Python.
It works reasonably well but when you round trip from rgb -> lch -> rgb repeatedly, you get some minor floating-point drift and I'm not sure if it's due to an error in my formulas or just normal float imprecision.
/cc @ericfischer @celoyd
from rio-color.
@perrygeo Looks pretty solid to me. The CIE standard writes some of the constants differently (for example 24/116 instead of 6/29), but looks like it amounts to the same thing so floating-point imprecision seems like the likely explanation for any differences.
from rio-color.
See #22 for cython implementation
Benchmarks
Tested on 4096x4096 tiled, uncompressed, rgba GeoTIFF. New method is significantly faster than old. Not as fast a imagemagick but it works better, retaining the profile of the input data.
method | time | notes |
---|---|---|
convert -modulate |
1.5s | HSV colorspace is way faster |
convert -set option:modulate:colorspace lch -modulate |
6.8s | loses geotiff tags, tiling, compression, trouble with alpha bands |
old skimage-based rio color "saturation ..." |
12.5s | |
new cython-based rio color "saturation ..." |
8.8s |
Bottom line is performance is 1/3 better than the skimage method but still way slower than our HSV-based saturation with convert.
Visuals
The results are not floating-point-exact with the skimage method but, when cast to 8bit rgb there is no perceptual difference.
Here's a demo of the saturation. I chose to use a stock photo instead of satellite image since it more clearly demonstrates the effect of saturation. This was created by the following rio_color script
from rio_color.colorspace import saturate_rgb
import rasterio
with rasterio.open("test.png") as src:
arr = src.read().astype('float64') / 255.0
arr = arr[0:3,]
prof = src.profile
prof['count'] = 3
for sat in range(0, 250, 5):
with rasterio.open("test_{:0>3}.png".format(sat), 'w', **prof) as dst:
rgb = saturate_rgb(arr, sat/100.0)
rgb8 = (rgb * 255).astype('uint8')
dst.write(rgb8)
saturation ranging from 0 (B&W) to 250% (garishly over-saturated). Sorry about the giffiness of the image - you get the idea though.
from rio-color.
Related Issues (20)
- pydocstyle reports 24 missing docstrings
- Color interpretation test failure with Rasterio 1.0 and GDAL 2.2
- 1.0 release HOT 6
- Reformat everything with black
- Exclude *.pyx and *.c files from wheels HOT 2
- Unable to install rio-color using pip in Windows 10 HOT 3
- How to set values to raster? HOT 1
- Sigmoidal switch: no change HOT 2
- How to rescale the image after rio-color operations? HOT 6
- AttributeError: module 'enum' has no attribute 'IntFlag' when install from source HOT 2
- Add a pyproject.toml file
- Releases blocked by lack of wheel building infra HOT 3
- 1.0.1 release HOT 1
- update rasterio requirements to non-prerelease version HOT 1
- 1.0.2 release HOT 1
- update requirements to match rasterio/cligj
- 1.0.3 release HOT 1
- 1.0.4 HOT 2
- How do I install and work in jupyterlab?
- Can't pip install in py 3.11 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rio-color.