Giter VIP home page Giter VIP logo

Comments (15)

bjarthur avatar bjarthur commented on July 24, 2024 2

i use At in the production code with a vector, that might have missing indices in the middle. just simplified it here for debugging purposes to use a contiguous range.

from dimensionaldata.jl.

rafaqz avatar rafaqz commented on July 24, 2024 2

Maybe try a PR changing it?

Also closing as this is really way outside of DDs sphere if influence ;)

from dimensionaldata.jl.

bjarthur avatar bjarthur commented on July 24, 2024 1

indeed, the problem lies elsewhere:

julia> @time I = DimensionalData.dims2indices(ZARR, (X(At(-1)),))
  0.000007 seconds (1 allocation: 16 bytes)
(12, Colon(), Colon())

julia> @time getindex(ZARR, I...);
  0.006218 seconds (3.53 k allocations: 1.637 MiB)

julia> @time I = DimensionalData.dims2indices(ZARR, (X(At(-1)), Y(At(-100:-1)), Z(At(-1000:-1))))
  0.000006 seconds (3 allocations: 8.906 KiB)
(12, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  91, 92, 93, 94, 95, 96, 97, 98, 99, 100], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  991, 992, 993, 994, 995, 996, 997, 998, 999, 1000])

julia> @time getindex(ZARR, I...);
  0.011937 seconds (108.19 k allocations: 19.329 MiB)

julia> @time I = DimensionalData.dims2indices(NC, (X(At(-1)),))
  0.000002 seconds (1 allocation: 16 bytes)
(12, Colon(), Colon())

julia> @time getindex(NC, I...);
  0.001656 seconds (171 allocations: 793.469 KiB)

julia> @time I = DimensionalData.dims2indices(NC, (X(At(-1)), Y(At(-100:-1)), Z(At(-1000:-1))))
  0.000008 seconds (3 allocations: 8.906 KiB)
(12, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  91, 92, 93, 94, 95, 96, 97, 98, 99, 100], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  991, 992, 993, 994, 995, 996, 997, 998, 999, 1000])

julia> @time getindex(NC, I...);
 77.239059 seconds (15.93 M allocations: 1.158 GiB, 0.08% gc time)

thanks! will keep digging...

from dimensionaldata.jl.

felixcremer avatar felixcremer commented on July 24, 2024 1

There should be branches on all relevant packages for the DiskArrays break. Lets get them merged and revisit this next week and we can reopen this on DiskArrays or YAXArrays as needed.

from dimensionaldata.jl.

rafaqz avatar rafaqz commented on July 24, 2024

Probably not a DD problem, here we just resolve the dims and selectors to regular indices and pass them to the parent array.

YAX/NetCDF/DiskArrays.jl do the actual reading

@meggart may know more about that part

(also just try it with Int/colon on the parent array so DD is not involved. DD.dims2indices(A, I) will get you the resolved infices)

from dimensionaldata.jl.

bjarthur avatar bjarthur commented on July 24, 2024

can you elaborate on DD.dim2indices? i'm not following...

from dimensionaldata.jl.

bjarthur avatar bjarthur commented on July 24, 2024

also, you see above how i showed it's fast without At and slow with it? wouldn't that indicate it is a DD problem?

from dimensionaldata.jl.

rafaqz avatar rafaqz commented on July 24, 2024

Sorry typo dims2indices.

DimensionalData is mostly a layer over AbstractArray indexing. It calls dims2indices on your dimensions and selectors then passes the results to the parent object.

Using inds = DD.dims2indices(NC, (At... etc))) manually to get the resolved indices will show you whats actually happening. Pass the inds as a tuple in the second argument.

Then passing the resulting indices to parent(NC)[inds...] will give you a benchmark wher DD is not involved at all, and hopefully help us assign blame ;)

from dimensionaldata.jl.

rafaqz avatar rafaqz commented on July 24, 2024

Probably NetCDF (DiskArrays.jl) chunk loading of that vector. Likely fixed on DiskArrays.jl main already.

But either way, using At like that is not great. Why not use .. for this and get a range back? A vector throws away the structural infirmation, and At will do hundreds of lookups where .. does 2.

(DiskArrays.jl on main is just smart enough to put the range back together!!!)

from dimensionaldata.jl.

felixcremer avatar felixcremer commented on July 24, 2024

I am wondering what is the comparison if you use Between for the ranges instead of At. Could we run into multiple scalar indexing?

from dimensionaldata.jl.

rafaqz avatar rafaqz commented on July 24, 2024

Between is deprecated for ..

from dimensionaldata.jl.

bjarthur avatar bjarthur commented on July 24, 2024

tried to debug with the master branch of all related packages and there are unsatisfiable requirements in the versions:

(TXYZTranscriptomeGUI) pkg> activate --temp
  Activating new project at `/var/folders/s5/8d629n5d7nsf37f60_91wzr40000gq/T/jl_hzHUzN`

(jl_hzHUzN) pkg> dev YAXArrays YAXArrayBase DimensionalData Zarr NetCDF DiskArrays
     Cloning git-repo `https://github.com/JuliaDataCubes/YAXArrayBase.jl.git`
     Cloning git-repo `https://github.com/JuliaGeo/NetCDF.jl.git`
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package DiskArrays [3c3547ce]:
 DiskArrays [3c3547ce] log:
 ├─possible versions are: 0.4.0 or uninstalled
 ├─restricted to versions 0.3 by NetCDF [30363a11] — no versions left
 │ └─NetCDF [30363a11] log:
 │   ├─possible versions are: 0.11.7 or uninstalled
 │   └─NetCDF [30363a11] is fixed to version 0.11.7
 └─DiskArrays [3c3547ce] is fixed to version 0.4.0

from dimensionaldata.jl.

rafaqz avatar rafaqz commented on July 24, 2024

You may need to edit the Project.toml of NetCDF.jl to live on the edge like that...

And be warned, we are talking about changes merged in the last few days, here be dragons.

And the question remains: why use At like that at all, even on the old versions? Why not .. ?

from dimensionaldata.jl.

bjarthur avatar bjarthur commented on July 24, 2024

the netcdf file seems to be reopened for each chunk as it iterates over them! specifically, this line here:

https://github.com/meggart/DiskArrays.jl/blob/v0.3.23/src/batchgetindex.jl#L91

calls this line

https://github.com/JuliaDataCubes/YAXArrayBase.jl/blob/master/src/datasets/netcdf.jl#L29

from dimensionaldata.jl.

bjarthur avatar bjarthur commented on July 24, 2024

not sure why the netcdf code can't mimic the zarr code, where eachchunk() could be defined as DiskArrays.eachchunk(a::NcVar) = DiskArrays.GridChunks(a,a.chunksize). the chunk size is stored in both structs:

julia> zarr.metadata.chunks
(1, 10, 100)

julia> nc.chunksize
(100, 10, 1)

from dimensionaldata.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.