Giter VIP home page Giter VIP logo

Comments (11)

perrygeo avatar perrygeo commented on June 15, 2024 2

Instead of going through the CLI, a direct comparison might be more illustrative of the performance difference:

with rasterio.open(srcpath) as src:
    arr = src.read().astype('float64') / 255.0
    assert arr.shape == (3, 4096, 4096)

    start = time.time()
    res = skimage_saturation(arr, 120)
    print("Scikit-image based saturation: {} seconds".format(time.time() - start))

    start = time.time()
    res2 = cython_saturation(arr, 120)
    print("Cython saturation: {} seconds".format(time.time() - start))

Saturation on a 4096x4096 array gives us

Scikit-image based saturation: 20.527370929718018 seconds
Cython saturation: 12.189836978912354 seconds

So the Cython is about 68% faster for a large array. Modest but significant.

from rio-color.

perrygeo avatar perrygeo commented on June 15, 2024

I added a noop operator just to test the raw IO speed. Some interesting data points...

$ time rio color $orig /tmp/time.tif "noop r"
real    0m10.501s

$ time rio color -j 8 $orig /tmp/time.tif "noop r"
real    0m7.605s

So even with 8 cores, the raw IO speed of rio color can't beat the modulate operation from imagemagick. 😮

The raw io/noop imagemagick convert takes 2.4s

from rio-color.

perrygeo avatar perrygeo commented on June 15, 2024

Imagemagick reads everything into memory. Rio color reads tile-by-tile.The vast majority of the time in the rio color raw IO case is spent reading and writing blocks. OTOH rio color keeps a low and stable memory profile. It's a tradeoff.

So we have two issues here

Optimization of IO vs memory

We could probably adjust the tradeoff by reading in multiple blocks at a time? See rasterio/rasterio#672

Optimizing conversion of colorspace for saturation

Profiling confirms that the colorspace conversion is the culprit

         547020 function calls (535179 primitive calls) in 61.707 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
       64    0.155    0.002   27.470    0.429 /Users/mperry/work/rio-color/rio_color/operations.py:60(rgb2lch)
       64    0.138    0.002   25.391    0.397 /Users/mperry/work/rio-color/rio_color/operations.py:70(lch2rgb)

52 seconds out of 61 total were spent swapping color spaces for this 64 x (3 x 1024 x 1024) block tif.

from rio-color.

perrygeo avatar perrygeo commented on June 15, 2024

Prior art: other python implementations of RGB->LCH conversion.

skimage (the benchmark)

We already use the skimage.color which works on arrays of shape (..., ..., 3) with values scaled 0..1. Doing the conversion requires going through the LAB color space. So far this is the benchmark.

colormath 👎

The colormath module is excellent for exploring and validating color conversions on single values. Not so great at optimizing the calculation on large arrays since it uses python objects. It's really designed for usage like:

from colormath.color_objects import LCHabColor
from colormath.color_objects import sRGBColor
rgb = sRGBColor(0.5, 0.8, 0.1)
convert_color(rgb, LCHabColor)
# LCHabColor(lch_l=74.70319882217456,lch_c=85.60523822265391,lch_h=124.3276180394683)

Converting that single value takes about 59.2 µs. For the largish raster above (8192 x 8192) that would take an hour assuming we could vectorize it effectively. So performance is just not even in the right ballpark. Still a useful library for understanding colorspaces though!

Other options

I'll update this ticket as we explore other avenues...


Overall, I'm surprised at the lack of performance-oriented, array-based colorspace libraries in the Python world.

It does seem like the RGB->LCH conversion is inherently more costly than simpler conversions like RGB->HSL. The sklearn and colormath LCH implementations are both roughly 4x slower than HSL. If so we might have to live with a performance penalty. But I'm convinced we can do better than skimage but it depends on how much time we want to spend in R&D.

from rio-color.

perrygeo avatar perrygeo commented on June 15, 2024

Using this script confirms the timings on my machine

For a (1024, 1024, 3) to round-trip through rgb->lch->rgb takes ~0.9 seconds

<function rgb2lab at 0x10a5aa400> 0.3880 seconds
<function lab2lch at 0x10a5aa950> 0.0730 seconds
<function lch2lab at 0x10a5aaa60> 0.0604 seconds
<function lab2rgb at 0x10a5aa488> 0.3852 seconds

And profiling indicates that most of that time is spend going through the XYZ color space

         115723 function calls (111899 primitive calls) in 3.108 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.435    0.145    0.485    0.162 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:852(lab2xyz)
        3    0.420    0.140    0.450    0.150 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:784(xyz2lab)
        3    0.416    0.139    0.696    0.232 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:561(rgb2xyz)
        3    0.397    0.132    0.644    0.215 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:515(xyz2rgb)
     2241    0.235    0.000    0.235    0.000 {built-in method array}
        7    0.209    0.030    0.209    0.030 {method 'reshape' of 'numpy.ndarray' objects}
        3    0.163    0.054    0.181    0.060 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:1400(_cart2polar_2pi)
        3    0.159    0.053    0.177    0.059 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:1410(lch2lab)
       12    0.076    0.006    0.076    0.006 {method 'copy' of 'numpy.ndarray' objects}
    27/24    0.065    0.002    0.068    0.003 {built-in method load_dynamic}
        6    0.062    0.010    0.062    0.010 {built-in method concatenate}
        8    0.060    0.007    0.060    0.007 {built-in method dot}
        1    0.045    0.045    0.045    0.045 {method 'random_sample' of 'mtrand.RandomState' objects}
        2    0.036    0.018    0.051    0.026 /Users/mperry/env/mapbox34/lib/python3.4/site-packages/numpy/core/numeric.py:2434(within_tol)
        3    0.029    0.010    1.174    0.391 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:917(rgb2lab)
        3    0.025    0.008    1.154    0.385 /Users/mperry/work/scikit-image/skimage/color/colorconv.py:944(lab2rgb)

from rio-color.

celoyd avatar celoyd commented on June 15, 2024

I think it might be realistic to optimize this ourselves.

Using skimage, we’re doing RGB to XYZ to Lab to LCH and back again. Now, everything in that pipeline is ufuncs, and for example Lab↔LCH is basically only Cartesian↔polar. (I say “only”, which is a bit unfair – hypot and arctan2 will be noticeably slower than, say, + and * on big arrays. But my point is that there isn’t a colossal pillar of mysterious calculus in there or anything.)

My hunch is that skimage’s speed hit is mostly from the overheads of nested function calls and reindexes/assignments that hinder transparent optimization. I suspect that the actual math is not that scary, and probably tractable by algebraic simplification down to a couple lines. That’s just a hunch, so check me.

What we lose is the conceptual clarity of the chain of explicit conversions between well-defined colorspaces. I see that as an acceptable though significant cost, but I could be persuaded.

from rio-color.

perrygeo avatar perrygeo commented on June 15, 2024

lose conceptual clarity

not too worried about that either, since we have existing libraries for that purpose. The main goal is to translate/refactor the maths into highly efficient array operations.

Blue sky goal: a Cython function that worked directly on RGB arrays (as read by rasterio). Pure C-speed math, no intermediate numpy representations. Not difficult conceptually but the 👿 is in the details.

from rio-color.

virginiayung avatar virginiayung commented on June 15, 2024

Cython function that worked directly on RGB arrays. Pure C-speed math, no intermediate numpy representations.

Would love to help out with this @perrygeo 👿

from rio-color.

perrygeo avatar perrygeo commented on June 15, 2024

In the new rio-color@colorspaces branch, I added two functions to do direct conversions to and from LCH and RGB color space.

The eventual plan is to port them to Cython or C, and vectorize them for use with numpy arrays. But for now, let's work with pure functions in Python to make sure we get the math right.

@virginiayung, I could use your 👀 and 💭 on the above functions. I mainly followed the wikipedia pages for sRGB and LAB in order to translate the formulas to Python.

It works reasonably well but when you round trip from rgb -> lch -> rgb repeatedly, you get some minor floating-point drift and I'm not sure if it's due to an error in my formulas or just normal float imprecision.

/cc @ericfischer @celoyd

from rio-color.

e-n-f avatar e-n-f commented on June 15, 2024

@perrygeo Looks pretty solid to me. The CIE standard writes some of the constants differently (for example 24/116 instead of 6/29), but looks like it amounts to the same thing so floating-point imprecision seems like the likely explanation for any differences.

from rio-color.

perrygeo avatar perrygeo commented on June 15, 2024

See #22 for cython implementation

Benchmarks

Tested on 4096x4096 tiled, uncompressed, rgba GeoTIFF. New method is significantly faster than old. Not as fast a imagemagick but it works better, retaining the profile of the input data.

method time notes
convert -modulate 1.5s HSV colorspace is way faster
convert -set option:modulate:colorspace lch -modulate 6.8s loses geotiff tags, tiling, compression, trouble with alpha bands
old skimage-based rio color "saturation ..." 12.5s
new cython-based rio color "saturation ..." 8.8s

Bottom line is performance is 1/3 better than the skimage method but still way slower than our HSV-based saturation with convert.

Visuals

The results are not floating-point-exact with the skimage method but, when cast to 8bit rgb there is no perceptual difference.

Here's a demo of the saturation. I chose to use a stock photo instead of satellite image since it more clearly demonstrates the effect of saturation. This was created by the following rio_color script

from rio_color.colorspace import saturate_rgb
import rasterio

with rasterio.open("test.png") as src:
    arr = src.read().astype('float64') / 255.0
    arr = arr[0:3,]
    prof = src.profile
    prof['count'] = 3
    for sat in range(0, 250, 5):
        with rasterio.open("test_{:0>3}.png".format(sat), 'w', **prof) as dst:
            rgb = saturate_rgb(arr, sat/100.0)
            rgb8 = (rgb * 255).astype('uint8')
            dst.write(rgb8)

saturation ranging from 0 (B&W) to 250% (garishly over-saturated). Sorry about the giffiness of the image - you get the idea though.

from rio-color.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.