Giter VIP home page Giter VIP logo

Comments (10)

will-moore avatar will-moore commented on August 17, 2024 4

Support for dask writing to OME-NGFF (ome/ome-zarr-py#192) is now released in ome-zarr-py 0.6.0

from dask-blog.

sofroniewn avatar sofroniewn commented on August 17, 2024 1

no - lol @freeman-lab gotta tell me these things! @joshmoore, have you seen this?

from dask-blog.

TomAugspurger avatar TomAugspurger commented on August 17, 2024

https://github.com/carbonplan/ndpyramid, from the geospatial context, might be helpful / worth linking to here. It's usage of Dask is pretty hidden behind xarray (but see carbonplan/ndpyramid#10 for a more direct Dask integration).

from dask-blog.

GenevieveBuckley avatar GenevieveBuckley commented on August 17, 2024

Ooh, very cool

I actually hadn't heard about ndpyramid. I'm going to have to try that one out, I might actually end up using it all the time. Thanks Tom!

from dask-blog.

GenevieveBuckley avatar GenevieveBuckley commented on August 17, 2024

@sofroniewn & @jni have you two seen or used ndpyramid? It looks super useful, especially in a napari context

from dask-blog.

joshmoore avatar joshmoore commented on August 17, 2024

lol @freeman-lab gotta tell me these things!

😆

@joshmoore, have you seen this?

I must admit, yes. But I must also admit to losing track of it. I think @thewtex has one as well as well as @aisenbarth's https://github.com/aeisenbarth/ngff-writer (which flowed into ome/ome-zarr-py#192) . Big 👍🏽 for doing what we can to work together on faster, better, slicker libraries.

from dask-blog.

jakirkham avatar jakirkham commented on August 17, 2024

There was also a lot of discussion in issue ( pydata/xarray#4118 ) about to handle this use case better. Josh likely has a better handle on where things are there than I.

from dask-blog.

jakirkham avatar jakirkham commented on August 17, 2024

This PR is currently in progress, but could be merged soon (for some loose value of "soon", I don't have a good idea of when) ome/ome-zarr-py#192

FWIW this just got merged! 🥳

from dask-blog.

chrisroat avatar chrisroat commented on August 17, 2024

There may be some code snippets that work well for smaller data, but when doing large datasets, using your general purpose cluster to do downsampling might be inefficient. It also adds additional tasks to what may be a clean analysis workflow.

For a large pipeline that is processing and dumping a lot of data, it can be cleaner and more efficient to split out the downsampling work. The dask processing cluster can store a dask array in a tensorstore dataset, using the precomputed neuroglancer driver (https://google.github.io/tensorstore/driver/neuroglancer_precomputed/index.html). Separate, dedicated resources can be specified for out-of-band downsampling: a task queue that feeds an igneous cluster that can have CPU/memory/IO tuned efficiently for the dedicated task.

Igneous is very well-developed and maintained. I currently use it locally on 10-100GB datasets regularly, and it always works smoothly -- even supporting sharded neuroglancer formats.

Shards are different than dask blocks - sharding allows much more efficient usage because a shard is written as a large file (allowing for smaller file counts and more efficient data xfer), but the much tinier chunks for visualization are stored within the shard in a format nice for HTTP range requests.

from dask-blog.

GenevieveBuckley avatar GenevieveBuckley commented on August 17, 2024

Thanks @chrisroat
Do you have an example of this I can look at? I'm not very familiar with igneous (mostly because I don't usually work with neuro datasets)

from dask-blog.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.