Comments (15)
i use At
in the production code with a vector, that might have missing indices in the middle. just simplified it here for debugging purposes to use a contiguous range.
from dimensionaldata.jl.
Maybe try a PR changing it?
Also closing as this is really way outside of DDs sphere if influence ;)
from dimensionaldata.jl.
indeed, the problem lies elsewhere:
julia> @time I = DimensionalData.dims2indices(ZARR, (X(At(-1)),))
0.000007 seconds (1 allocation: 16 bytes)
(12, Colon(), Colon())
julia> @time getindex(ZARR, I...);
0.006218 seconds (3.53 k allocations: 1.637 MiB)
julia> @time I = DimensionalData.dims2indices(ZARR, (X(At(-1)), Y(At(-100:-1)), Z(At(-1000:-1))))
0.000006 seconds (3 allocations: 8.906 KiB)
(12, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000])
julia> @time getindex(ZARR, I...);
0.011937 seconds (108.19 k allocations: 19.329 MiB)
julia> @time I = DimensionalData.dims2indices(NC, (X(At(-1)),))
0.000002 seconds (1 allocation: 16 bytes)
(12, Colon(), Colon())
julia> @time getindex(NC, I...);
0.001656 seconds (171 allocations: 793.469 KiB)
julia> @time I = DimensionalData.dims2indices(NC, (X(At(-1)), Y(At(-100:-1)), Z(At(-1000:-1))))
0.000008 seconds (3 allocations: 8.906 KiB)
(12, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 91, 92, 93, 94, 95, 96, 97, 98, 99, 100], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000])
julia> @time getindex(NC, I...);
77.239059 seconds (15.93 M allocations: 1.158 GiB, 0.08% gc time)
thanks! will keep digging...
from dimensionaldata.jl.
There should be branches on all relevant packages for the DiskArrays break. Lets get them merged and revisit this next week and we can reopen this on DiskArrays or YAXArrays as needed.
from dimensionaldata.jl.
Probably not a DD problem, here we just resolve the dims and selectors to regular indices and pass them to the parent array.
YAX/NetCDF/DiskArrays.jl do the actual reading
@meggart may know more about that part
(also just try it with Int/colon on the parent array so DD is not involved. DD.dims2indices(A, I)
will get you the resolved infices)
from dimensionaldata.jl.
can you elaborate on DD.dim2indices
? i'm not following...
from dimensionaldata.jl.
also, you see above how i showed it's fast without At
and slow with it? wouldn't that indicate it is a DD problem?
from dimensionaldata.jl.
Sorry typo dims2indices
.
DimensionalData is mostly a layer over AbstractArray indexing. It calls dims2indices
on your dimensions and selectors then passes the results to the parent object.
Using inds = DD.dims2indices(NC, (At... etc)))
manually to get the resolved indices will show you whats actually happening. Pass the inds as a tuple in the second argument.
Then passing the resulting indices to parent(NC)[inds...]
will give you a benchmark wher DD is not involved at all, and hopefully help us assign blame ;)
from dimensionaldata.jl.
Probably NetCDF (DiskArrays.jl) chunk loading of that vector. Likely fixed on DiskArrays.jl main
already.
But either way, using At
like that is not great. Why not use ..
for this and get a range back? A vector throws away the structural infirmation, and At will do hundreds of lookups where ..
does 2.
(DiskArrays.jl on main is just smart enough to put the range back together!!!)
from dimensionaldata.jl.
I am wondering what is the comparison if you use Between for the ranges instead of At. Could we run into multiple scalar indexing?
from dimensionaldata.jl.
Between
is deprecated for ..
from dimensionaldata.jl.
tried to debug with the master branch of all related packages and there are unsatisfiable requirements in the versions:
(TXYZTranscriptomeGUI) pkg> activate --temp
Activating new project at `/var/folders/s5/8d629n5d7nsf37f60_91wzr40000gq/T/jl_hzHUzN`
(jl_hzHUzN) pkg> dev YAXArrays YAXArrayBase DimensionalData Zarr NetCDF DiskArrays
Cloning git-repo `https://github.com/JuliaDataCubes/YAXArrayBase.jl.git`
Cloning git-repo `https://github.com/JuliaGeo/NetCDF.jl.git`
Resolving package versions...
ERROR: Unsatisfiable requirements detected for package DiskArrays [3c3547ce]:
DiskArrays [3c3547ce] log:
├─possible versions are: 0.4.0 or uninstalled
├─restricted to versions 0.3 by NetCDF [30363a11] — no versions left
│ └─NetCDF [30363a11] log:
│ ├─possible versions are: 0.11.7 or uninstalled
│ └─NetCDF [30363a11] is fixed to version 0.11.7
└─DiskArrays [3c3547ce] is fixed to version 0.4.0
from dimensionaldata.jl.
You may need to edit the Project.toml of NetCDF.jl to live on the edge like that...
And be warned, we are talking about changes merged in the last few days, here be dragons.
And the question remains: why use At like that at all, even on the old versions? Why not ..
?
from dimensionaldata.jl.
the netcdf file seems to be reopened for each chunk as it iterates over them! specifically, this line here:
https://github.com/meggart/DiskArrays.jl/blob/v0.3.23/src/batchgetindex.jl#L91
calls this line
https://github.com/JuliaDataCubes/YAXArrayBase.jl/blob/master/src/datasets/netcdf.jl#L29
from dimensionaldata.jl.
not sure why the netcdf code can't mimic the zarr code, where eachchunk()
could be defined as DiskArrays.eachchunk(a::NcVar) = DiskArrays.GridChunks(a,a.chunksize)
. the chunk size is stored in both structs:
julia> zarr.metadata.chunks
(1, 10, 100)
julia> nc.chunksize
(100, 10, 1)
from dimensionaldata.jl.
Related Issues (20)
- Latest main branch fails somehow at opening netcdf and zarr files HOT 1
- Broadcast on grouped array HOT 2
- broadcasted_dims on groupby HOT 1
- mapslices on in memory vector fails
- Can't show a DimSlices object HOT 4
- Docs build issues HOT 3
- Dimension metadata HOT 3
- Failing on nightly because uses private methods from Base HOT 1
- StackOverflow when constructing table from DimArray with single dimension HOT 2
- stable docs link is broken HOT 3
- support Julia 1.9
- error showing DimArray HOT 6
- DimensionMismatch with `cat` HOT 10
- Improvements to docstrings HOT 12
- Cannot get my own `Categorical` order to work HOT 2
- docs/stable toggle in readme returns 404 HOT 2
- broadcast_dims.(*, ..., ...) no method matching order(::Vector{Vector{Int64}}) HOT 3
- Typo in function name HOT 3
- Accessing the dimension combinations HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dimensionaldata.jl.