rafaqz / dimensionaldata.jl Goto Github PK

View Code? Open in Web Editor NEW

263.0 11.0 38.0 75.65 MB

Named dimensions and indexing for julia arrays and other data

Home Page: https://rafaqz.github.io/DimensionalData.jl/stable/

License: MIT License

Julia 100.00%

arrays axis-labels gpu-support tables

dimensionaldata.jl's Introduction

DimensionalData

Tip

Visit the latest documentation at https://rafaqz.github.io/DimensionalData.jl/dev/

DimensionalData.jl provides tools and abstractions for working with datasets that have named dimensions, and optionally a lookup index. It provides no-cost abstractions for named indexing, and fast index lookups.

DimensionalData is a pluggable, generalised version of AxisArrays.jl with a cleaner syntax, and additional functionality found in NamedDims.jl. It has similar goals to pythons xarray, and is primarily written for use with spatial data in Rasters.jl.

Important

INSTALLATION

julia>]
pkg> add DimensionalData

Start using the package:

using DimensionalData

The basic syntax is:

A = DimArray(rand(50, 31), (X(), Y(10.0:40.0)));

Or just use rand directly, which also works for zeros, ones and fill:

A = rand(X(10), Y(10.0:20.0))

╭───────────────────────────╮
│ 10×11 DimArray{Float64,2} │
├───────────────────────────┴──────────────────────────────── dims ┐
  ↓ X,
  → Y Sampled{Float64} 10.0:1.0:20.0 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────┘
 10.0       11.0       12.0        13.0       14.0         …  16.0       17.0       18.0        19.0       20.0
  0.71086    0.689255   0.672889    0.766345   0.00277696      0.773863   0.252199   0.279538    0.808931   0.783528
  0.934464   0.815631   0.815715    0.890573   0.158584        0.304733   0.936321   0.499803    0.839926   0.979722
  ⋮                                                        ⋱                                                ⋮
  0.935495   0.460879   0.0218015   0.703387   0.756411    …   0.431141   0.619897   0.0536918   0.506488   0.170494
  0.800226   0.208188   0.512795    0.421171   0.492668        0.238562   0.4694     0.320596    0.934364   0.147563

Note

Subsetting by index is easy:

A[Y=1:10, X=1]

╭────────────────────────────────╮
│ 10-element DimArray{Float64,1} │
├────────────────────────────────┴─────────────────────────── dims ┐
  ↓ Y Sampled{Float64} 10.0:1.0:19.0 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────┘
 10.0  0.130198
 11.0  0.693343
 12.0  0.400656
  ⋮    
 17.0  0.877581
 18.0  0.866406
 19.0  0.605331

One can also subset by lookup, using a Selector, lets try At:

A[Y(At(25))]

╭────────────────────────────────╮
│ 50-element DimArray{Float64,1} │
├────────────────────────── dims ┤
  ↓ X
└────────────────────────────────┘
  1  0.5318
  2  0.212491
  3  0.99119
  4  0.373549
  5  0.0987397
  ⋮  
 46  0.503611
 47  0.225421
 48  0.293564
 49  0.976395
 50  0.622586

There is also Near (for inexact/nearest selection), Contains (for Intervals containing values), Between or .. for range selection, and Where for queries, among others.

Plotting with Makie.jl is as easy as:

using GLMakie, DimensionalData
boxplot(rand(X('a':'d'), Y(2:5:20)))

And the plot will have the right ticks and labels.

See the docs for more details

Note

Recent changes have greatly reduced the exported API.

Previously exported methods can be brought into global scope by using the sub-modules they have been moved to - Lookup and Dimensions:

using DimensionalData
using DimensionalData.Lookup, DimensionalData.Dimensions

Important

Alternative Packages

There are a lot of similar Julia packages in this space. AxisArrays.jl, NamedDims.jl, NamedArrays.jl are registered alternative that each cover some of the functionality provided by DimensionalData.jl. DimensionalData.jl should be able to replicate most of their syntax and functionality.

AxisKeys.jl and AbstractIndices.jl are some other interesting developments. For more detail on why there are so many similar options and where things are headed, read this thread.

The main functionality is explained here, but the full list of features is listed at the API page.

dimensionaldata.jl's People

Contributors

Stargazers

Watchers

dimensionaldata.jl's Issues

SplitApplyCombine

It would be good to support SplitApplyCombine.jl or something similar, so that complex manipulations could be applied along dimensions in a structured way, say getting mean daily values from hourly values over some timespan.

Datasets?

Would you be interested in supporting something like the xarray.Dataset in this package (or in a related one)? Namely, a collection of dimensional arrays with shared indices which you can do joint operations (like indexing and split-apply-combine) on?

One of the main issues with this would be the lack of type stability (the object is close to Dict{String, DimArray{Any...}}), though operations which require dealing with this should be pretty high level. I'm not up to date on the state of performance in packages which face similar issues , i.e. DataFrames.jl. I imagine those packages probably have relevant experience.

[Dev Question] What's the purpose/use of Grid and co types?

It is useful for the community to have a way to transform a loaded NetCDF variable into an array with Dimension data attached it, like what this package does. We will implement such an interface in NCDatasets.jl, cf Alexander-Barth/NCDatasets.jl#60 , but unfortunately there are numerous AxisArray-like packages. I've had a look in all of them, and at least from the surface, this one seems to be the strongest candidate.

But since it has been developed by a single person, and I would like to have at least some basic understanding before basing NCDatasets.jl on this one, there are some dev questions. The first one is regarding the Grid types/subtypes/supertypes, which at the moment is something I am not sure about.

What is their purpose?
Why is it that the dimension data type by itself is not enough? For example, a Range should always be a "standard grid" (AlignedGrid as its called here), while a Vector should always be a BoundedGrid?

Support dims everywhere

Cat will require specifying a new dimension with axis values or an existing dimension without them.

Why are basic operations on DimensionalArray 100s of times slower than their lowered versions?

Here is a benchmarking script:

using DimensionalData, Test
using DimensionalData: Time, X, @dim
using Dates: DateTime, Month
using BenchmarkTools

dt = 12
dx = 20.0

timespan = DateTime(2001):Month(dt):DateTime(2011,12)
t = Time(timespan)
x = X(Vector(0.5:dx:359.5))
d = (x, t)
A = DimensionalArray(rand(length.(d)...), d)

println("Time to access via Tim(1:3)")
@btime $(A)[$(Time(1:3))];
println("Time to access via A.data[:, 1:3]")
@btime $(A.data)[1:3];

println("Time to do A+A")
@btime $(A) .+ $(A);

println("Time to do A.data + A.data")
@btime $(A.data) .+ $(A.data);

which on current master outputs:

Time to access via Tim(1:3)
  660.131 ns (3 allocations: 624 bytes)
Time to access via A.data[:, 1:3]
  28.370 ns (1 allocation: 112 bytes)
Time to do A+A
  3.538 μs (7 allocations: 2.14 KiB)
Time to do A.data + A.data
  148.325 ns (1 allocation: 1.77 KiB)

why is there such a vast performance difference between the cases?

I mean the docs only advertise zero cost 1-index, e.g. [X(1), Time(1)] but that kind of indexing is useless in practice...? It doesn't make sense to access A "element by element" this way to get the perfmrance of Base, does it?

Zero cost indexing broken in travis on 1.3

Julia 1.3 no longer compiles away indexing. It still works fine on 1.0 and 1.2.

So performance is going to be a moving target.

There should also be a specific test that int/dim indexing have similar performance.

StackOverflowError when using `At`

Using Julia 1.2.0 with DimensionalData v0.1.0 I encounter an issue. Perhaps I'm doing something wrong with the constructor or indexing?

julia> d = DimensionalArray(rand(12,10),Dim{:Age}(0:11),Dim{:Duration}(1:10))
12×10 DimensionalArray{Float64,2,Dim{:Age,UnitRange{Int64},Nothing,DimensionalData.Forward},Dim{:Duration,UnitRange{Int64},Nothing,DimensionalData.Forward},Array{Float64,2}}:
 0.778748   0.495674   0.972396  0.0743772  0.160006   0.364009    0.802004   0.315814  0.516349   0.25529
 0.428088   0.595999   0.333938  0.500439   0.885373   0.623493    0.711443   0.265025  0.0625082  0.745435
 0.703649   0.369344   0.394719  0.45761    0.290267   0.364453    0.613593   0.387107  0.628519   0.819933
 0.977845   0.0794241  0.531902  0.280992   0.402047   0.756775    0.13164    0.062591  0.694477   0.416015
 0.0556876  0.49882    0.672613  0.212836   0.743249   0.424285    0.913358   0.460033  0.0345535  0.0104587
 0.599711   0.563205   0.107054  0.999652   0.0670963  0.00359502  0.0169267  0.344555  0.747658   0.554389
 0.448329   0.939001   0.640038  0.921711   0.915994   0.939289    0.837513   0.868865  0.66771    0.783397
 0.782341   0.857793   0.135598  0.765034   0.520911   0.378882    0.482239   0.22485   0.770335   0.479799
 0.6601     0.51877    0.397913  0.916057   0.391102   0.245188    0.222499   0.814934  0.375634   0.401231
 0.284916   0.169768   0.620476  0.250912   0.64616    0.206558    0.0312784  0.178456  0.766275   0.950708
 0.445494   0.609898   0.85941   0.503771   0.378988   0.568289    0.116531   0.88722   0.0570116  0.372341
 0.591082   0.890146   0.976019  0.410203   0.925599   0.9378      0.129331   0.523501  0.982223   0.846347

When I try to index with At:

julia> d[At(1),At(1)]
ERROR: StackOverflowError:
Stacktrace:
 [1] sel2indices(::Dim{:Age,UnitRange{Int64},Nothing,DimensionalData.Forward}, ::Tuple{At{Int64},At{Int64}}) at C:\Users\user\.julia\packages\DimensionalData\7xUVM\src\selector.jl:42 (repeats 80000 times)

Also when using the named dimension:

julia> d[Dim{:Age}(1)]
ERROR: StackOverflowError:
Stacktrace:
 [1] dims2indices(::Dim{:Age,UnitRange{Int64},Nothing,DimensionalData.Forward}, ::Tuple{Dim{:Age,Int64,Nothing,DimensionalData.Forward}}, ::Function) at C:\Users\alecl\.julia\packages\DimensionalData\7xUVM\src\primitives.jl:44 (repeats 80000 times)

Broadcasting and empty dims

using DimensionalData
using DimensionalData: X, Y

function dimarray(dims...)
    DimensionalArray(
        collect(reshape(1:reduce(*, length.(dims)), length.(dims))),
        dims
    )
end

a = dimarray(X(2:2:100), Y(3:3:300))
b = dimarray(Y(3:3:300))

collect(a) .* collect(b')  # This works

a .* b' # Throws dimension mismatch

Traceback

ERROR: DimensionMismatch("X and DimensionalData.EmptyDim dims on the same axis")
Stacktrace:
 [1] _broadcasted_dims(::DimensionalArray{Int64,2,Tuple{X{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing},Y{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing}},Tuple{},Array{Int64,2}}, ::DimensionalArray{Int64,2,Tuple{DimensionalData.EmptyDim,Y{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing}},Tuple{},LinearAlgebra.Adjoint{Int64,Array{Int64,1}}}) at /Users/isaac/github/DimensionalData.jl/src/primitives.jl:360
 [2] _broadcasted_dims(::Base.Broadcast.Broadcasted{DimensionalData.DimensionalStyle{Base.Broadcast.DefaultArrayStyle{2}},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(*),Tuple{DimensionalArray{Int64,2,Tuple{X{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing},Y{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing}},Tuple{},Array{Int64,2}},DimensionalArray{Int64,2,Tuple{DimensionalData.EmptyDim,Y{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing}},Tuple{},LinearAlgebra.Adjoint{Int64,Array{Int64,1}}}}}) at /Users/isaac/github/DimensionalData.jl/src/broadcast.jl:77
 [3] copy(::Base.Broadcast.Broadcasted{DimensionalData.DimensionalStyle{Base.Broadcast.DefaultArrayStyle{2}},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(*),Tuple{DimensionalArray{Int64,2,Tuple{X{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing},Y{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing}},Tuple{},Array{Int64,2}},DimensionalArray{Int64,2,Tuple{DimensionalData.EmptyDim,Y{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing}},Tuple{},LinearAlgebra.Adjoint{Int64,Array{Int64,1}}}}}) at /Users/isaac/github/DimensionalData.jl/src/broadcast.jl:39
 [4] materialize(::Base.Broadcast.Broadcasted{DimensionalData.DimensionalStyle{Base.Broadcast.DefaultArrayStyle{2}},Nothing,typeof(*),Tuple{DimensionalArray{Int64,2,Tuple{X{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing},Y{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing}},Tuple{},Array{Int64,2}},DimensionalArray{Int64,2,Tuple{DimensionalData.EmptyDim,Y{StepRange{Int64,Int64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Int64},Nothing}},Tuple{},LinearAlgebra.Adjoint{Int64,Array{Int64,1}}}}}) at ./broadcast.jl:819
 [5] top-level scope at REPL[63]:1

First, should this work? If so, should an empty dimension just match against anything when broadcasting?

Implement * (or other unicode symbol) as in xarray

In Python's xarray, if you have a 3D array A with dimensions (Lon, Lat, Time), and you have another array B with dimension (Lat,) (i.e. a Vector) you can do "dimension-wise" multiplication. This means, (and assuming that size(A, Lat) == size(B, Lat)) you can write:

A * B

which would multiply every element A[:, i, :] with B[i]. This is tremendously useful when multiplying a spatiotemporal field with a vector of probabilities, e.g. weighting a spatiotemporal distribution by the cosine of the latitude.

Yes or no?

The README example conflicts Time with Dates.Time

julia> using DimensionalData: Time, X
WARNING: ignoring conflicting import of DimensionalData.Time into Main

Seems to me that Time is supposed to be a time axis, but results in conflicts with Dates.Time. As it currently stands the README example doesn't run and Time needs to be explicitly qualified.

I wonder, since this Time is supposed to be a thing used so often and Dates.Time is actually also a central thing, shouldn't the name of DimensionalData.Time be something just a bit different to avoid the conflict?

Handle Transpose and PermuteDimsArray

solutions:

define dims for them and reorder the dims tuple of the contained array to match the transpose.
add a constructor for Transpose(a::AbstractDimensionalArray) that rewraps the Transpose in the dim array, as with SubArray in view.

As discussed in
JuliaCollections/AxisArraysFuture#1

Repeated use of a dimension

For context, I would like to be able to have labelled arrays which represent pairwise measures. These measures can be distances, or they could be adjacency matrices for some network structure.

For these pairwise measures, I think it makes sense for the resulting array to use the same dimension twice. As an example:

using Dates, DimensionalData
using DimensionalData: Time, X
using Distances

timespan = DateTime(2001):Month(1):DateTime(2001,12)
A = DimensionalArray(rand(12,10), (Time(timespan), X(10:10:100))) 
D = DimensionalArray(  # Since the dimensions don't propagate
    pairwise(CosineDist(), A),
    (X(10:10:100), X(10:10:100))
)

Subsetting by dimension results in both axes being subset:

D[X<|Between(0, 50)]
# DimensionalArray with dimensions:
#  X: 10:10:50
#  X: 10:10:50
# and data: 5×5 Array{Float64,2}:
#  0.0       0.321451  0.352489  0.398895  0.444586
#  0.321451  0.0       0.160367  0.193874  0.143861
#  0.352489  0.160367  0.0       0.232279  0.111173
#  0.398895  0.193874  0.232279  0.0       0.247832
#  0.444586  0.143861  0.111173  0.247832  0.0

If I try to look at the distance between 10 and 20, I get the wrong result (both axes are being indexed by the first indexer):

D[X<|At(10), X<|At(20)]
# 0.0

For my work, there would be a lot of value in being able to use a dimension multiple times for one array, and being able to apply different selections to those indices. However I'm not sure how this can be resolved without losing the "order of indices doesn't matter" feature here.

Can these be reconciled? If not, I don't think it'd that large of a restriction to make order matter when indices are repeated, or just in general. Having the order be important makes the performance implications of the code be more explicit.

Using the order of the selectors to specify axis does is a workaround when dimensions are repeated:

 D[2:3, 1:2] == D[Between(20, 30), Between(10, 20)]

As an aside, this took me a little bit to find out, as I expected this to work initially:

julia> D[Between(20, 30), :]
ERROR: ArgumentError: invalid index: Between{Tuple{Int64,Int64}}((20, 30)) of type Between{Tuple{Int64,Int64}}

Ambiguity in _dropdims

I was just trying to test interoperability of DimensionalArrays and DiskArrays and stumbled over this one:

import DiskArrays: AbstractDiskArray, eachchunk, haschunks, Chunked, estimate_chunksize, GridChunks, findints
struct PseudoDiskArray{T,N,A<:AbstractArray{T,N}} <: AbstractDiskArray{T,N}
  parent::A
  chunksize::NTuple{N,Int}
end
PseudoDiskArray(a;chunksize=size(a)) = PseudoDiskArray(a,chunksize)
haschunks(a::PseudoDiskArray) = Chunked()
eachchunk(a::PseudoDiskArray) = GridChunks(a,a.chunksize)
Base.size(a::PseudoDiskArray) = size(a.parent)
function DiskArrays.readblock!(a::PseudoDiskArray,aout,i...)
  ndims(a) == length(i) || error("Number of indices is not correct")
  all(r->isa(r,AbstractUnitRange),i) || error("Not all indices are unit ranges")
  #println("reading from indices ", join(string.(i)," "))
  println("Reading at index ", join(string.(i)," "))
  aout .= a.parent[i...]
end
function DiskArrays.writeblock!(a::PseudoDiskArray,v,i...)
  ndims(a) == length(i) || error("Number of indices is not correct")
  all(r->isa(r,AbstractUnitRange),i) || error("Not all indices are unit ranges")
  println("Writing to indices ", join(string.(i)," "))
  view(a.parent,i...) .= v
end
a = PseudoDiskArray(rand(10,9,2),chunksize=(5,3,2))

ad = DimensionalArray(a,(Lon(1:10),Lat(1:9),Time(1:2)));
sum(ad)

MethodError: _dropdims(::Array{Float64,3}, ::Tuple{}) is ambiguous. Candidates:
  _dropdims(A::AbstractArray, dims::Tuple{Vararg{Int64,N}} where N) in Base at abstractarraymath.jl:72
  _dropdims(A::AbstractArray, dims::Tuple{Vararg{AbstractDimension,N}} where N) in DimensionalData at /home/fgans/julia_depots/packages/DimensionalData/XTjGa/src/methods.jl:62
Possible fix, define
  _dropdims(::AbstractArray, ::Tuple{})

Should be easy to fix...

swapdims and plot with dims arg methods

swapdims should allow us to rewrap the dimension types to some other dimensions when passed a tuple of dimension types, leaving everything else the same - or replace the dims entirely when passed a tuple of dimension instances.

This will be generally useful, but especially with plots where you want some different order or behaviour. It's actually easier to just do this than write custom methods. plot(A, dims) would return swapdims(A, dims).

The README example doesn't work due to int convertion

julia> timespan = DateTime(2001):Month(1):DateTime(2001,12)
2001-01-01T00:00:00:1 month:2001-12-01T00:00:00
julia> A = DimensionalArray(rand(12,10), (DimensionalData.Time(timespan), X(10:10:100))) 
ERROR: InexactError: Int64(0.9090909090909091)
Stacktrace:
 [1] Int64 at .\float.jl:709 [inlined]
 [2] * at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\Dates\src\periods.jl:92 [inlined]
 [3] * at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\Dates\src\periods.jl:93 [inlined]
 [4] lerpi at .\range.jl:663 [inlined]
 [5] unsafe_getindex(::LinRange{Millisecond}, ::Int64) at .\range.jl:657
 [6] iterate at .\range.jl:590 [inlined]
 [7] issorted(::LinRange{Millisecond}, ::Base.Order.ForwardOrdering) at .\sort.jl:59     
 [8] #issorted#1 at .\sort.jl:91 [inlined]
 [9] #issorted at .\none:0 [inlined]
 [10] orderof(::LinRange{Millisecond}) at C:\Users\datse\.julia\packages\DimensionalData\EN4NY\src\primitives.jl:230
 [11] identify(::UnknownGrid, ::LinRange{Millisecond}) at C:\Users\datse\.julia\packages\DimensionalData\EN4NY\src\primitives.jl:225
 [12] formatdims(::Int64, ::DimensionalData.Time{StepRange{DateTime,Month},UnknownGrid,Nothing}) at C:\Users\datse\.julia\packages\DimensionalData\EN4NY\src\primitives.jl:209    
 [13] map at .\tuple.jl:159 [inlined]
 [14] formatdims at C:\Users\datse\.julia\packages\DimensionalData\EN4NY\src\primitives.jl:201 [inlined]
 [15] formatdims at C:\Users\datse\.julia\packages\DimensionalData\EN4NY\src\primitives.jl:199 [inlined]
 [16] #DimensionalArray#19(::Tuple{}, ::Type{DimensionalArray}, ::Array{Float64,2}, ::Tuple{DimensionalData.Time{StepRange{DateTime,Month},UnknownGrid,Nothing},X{StepRange{Int64,Int64},UnknownGrid,Nothing}}) at C:\Users\datse\.julia\packages\DimensionalData\EN4NY\src\array.jl:101
 [17] DimensionalArray(::Array{Float64,2}, ::Tuple{DimensionalData.Time{StepRange{DateTime,Month},UnknownGrid,Nothing},X{StepRange{Int64,Int64},UnknownGrid,Nothing}}) at C:\Users\datse\.julia\packages\DimensionalData\EN4NY\src\array.jl:101
 [18] top-level scope at none:0

cat with an empty dims obj should find refdims and cat them

refdims may have Time or some other dimension that could be used to reconstruct slices along that dimension with the correct grid - without passing it in explicitly.

so: cat(a, b, c; dims=Time) should reconstruct the Time dimension index from refdims for a, b and c.

How do you construct a dimensional array?

It's not super transparent from the documentation how you construct a DimensionalArray. For example, I tried to do:

DimensionalArray(rand(10,2),(Time(),X()))

And this did not work. I'm sure I could figure out how to do it if I read the source, but it seems like it would be good to have some examples of how to create an array somewhere in the README or docs.

Overhaul broadcast

Fully test similar and broadcast.
Handle and clarify the behaviour of broadcast over two AbstractDimensionalArraywith different dimensions.

Broadcast with CuArray parent is broken

Broadcasting on a GeoArray that wraps a CuArray is doing scalar getindex/setindex.

This is probably true of other parent types as well. We need to defer all of these operations to the parent array earlier so the correct methods are dispatched, but also hand dimension rebuilding.

Struct name of AllignedGrid

Hello!

I was wondering if the double "ll" in AllignedGrid type was on purpose or simply a typo. I think it only takes one "L" and not two.

Cheers!

Permute dims method redefinition

WARNING: Method definition permutedims(Tuple{#s16, Vararg{DimensionalData.AbstractDimension{T, G, M} where M where G where T, N}} where #s16<:(DimensionalData.AbstractDimension{T, G, M} where M where G where T) where N, Tuple{#s16, Vararg{DimensionalData.AbstractDimension{T, G, M}
where M where G where T, N}} where #s16<:(DimensionalData.AbstractDimension{T, G, M} where M where G where T) where N) in module DimensionalData at C:\Users\m300808\.julia\dev\DimensionalData\src\primitives.jl:25 overwritten at C:\Users\m300808\.julia\dev\DimensionalData\src\primitives.jl:28.
  ** incremental compilation may be fatally broken for this module **

I've been getting this consistently for a week now, thought i'd better put it on an issue so its not forgotten.

Incorrect dimensions after matrix multiplication

My assumption here is that x::DimensionalArray -> length.(dims(x)) == size(x) should be true for enumerated dimensions. Here's an example:

using Dates, DimensionalData
using DimensionalData: Time, X
timespan = DateTime(2001):Month(1):DateTime(2001,12)
A = DimensionalArray(rand(12,10), (Time(timespan), X(10:10:100))) 

@assert length.(dims(A)) == size(A)
@assert length.(dims(A')) == size(A')

length.(dims(A * A')) == size(A * A')
# false

This seems right:

size(A * A')
# (12, 12)

But the first dimension is X, when it should be Time:

dims(A * A')
# (X: 10:10:100, Time: 2001-01-01T00:00:00:1 month:2001-12-01T00:00:00)

However, if both dimensions were Time, that would start leading to problems with #46.

@dim cannot be used to export new dimensions

MWE:

julia> module Foo
       using DimensionalData; using DimensionalData: @dim
       @dim Bar "Bar"
       export Bar
       end

julia> using .Foo

julia> DimensionalArray(rand(10),(Bar(1:10),))
ERROR: UndefVarError: Forward not defined
Stacktrace:
 [1] Bar(::UnitRange{Int64}) at /Users/davidlittle/.julia/packages/DimensionalData/sszkE/src/dimension.jl:192
 [2] top-level scope at REPL[3]:1

This looks like an issue with macro hygiene. Pretty sure I know how to fix it. I don't have a lot of time at the moment with a newborn in the house: if you don't end up getting to it yourself in a few weeks I can probably fix it by then.

cat should run permutedims and reverse automatically

It should be possible to cat arrays with different dim order if they have the same dims. The order of the first one would be imposed on the rest.

If they are in a different order that should be fixed too.

How to select a dimension by indices (if possible)?

Let's say I have the scenario where I want to select one of the dimensions of my data by indices, instead of values (like e.g. take the first 10 time points, but without necessarily knowing if "Time" is the first dimension).

Is this possible? To my understanding At, Near, Between use values. If not possible, would this be supported with another selector ByIndex or some sorts?

[Doc] Non uniform grids?

Hi,

I remember in some issue discussion that in general dimensions could be subtype of AbstractArray instead of AbstractVector to support non-uniform grids.

I am trying to get my head around how to use this here or even more so how to use Near for such grids.

I now have an unstructured grid (equal-area Gaussian) and my lons and lats are actually just a very long vector while my "data" is also a very long vector. I.e. at every point at lon lat lon[i], lat[i] I have the value A[i].

If using such datasets is supported here, w should somehow document this somewhere with some example of how to best use them and how to best define the dimensional array with this format. Is the best option for me to make my A have two dimensions: LonLat, Time, where the LonLat dimension is actually a Nx2 matrix? Then I guess using Near could be exlained easily?

How to get the value of a dimension by _name_ from a DImensionalArray?

I have a nice dimensional array, that has dimensions

At the moment, if I want to get the value of the i-th dimension, I can do dims(s)[i].val. But, how do I get the dimension's value by qualifying the dimension type?

Like, I want the values of the Lon dimension of s.

Related with #38 .

Add methods for mean, sum etc on AbstractDimensionalArray

These methods should call themselves on parent, and rebuild the dims manually where necessary - its not clear if relying on similar is fully workable.

It's also not clear if they should be combined with the current methods, or dimensional indexing on AbstractArray and regular indexing on AbstractDimensionalArray should stay separate.

Currently dims2indices conversion and rebuild() are in different methods.

Dimension dispatch should be abstract, not concrete

Plot recipes dispatching on X, Y, Z and Lat/Lon (in GeoData.jl) should dispatch on AbstractX, AbstractY, AbstractLon <: AbstractX and AbstractLat <: AbstractY.

This will mean plot recipes for X, Y will work on Lon/Lat but not vice-versa (which is what we want).

It also means custom dims can be subtyped so that plot recipes work for them too, when you want them to.

An easy way to do this is to add an (optional) argument to the @dim macro that sets up the subtype. Maybe even @dim "Custom X" C <: AbstractX

EmptyDims

What the use case for EmptyDims? Should it be the added dimension whenever a DimensionalArray get's an unnamed dim added (e.g. a[:, :] where a is 1d)?

Also, should EmptyDims always have size one? That could solve the dimvec' * dimvec issue by defining something like:

*(DimArray{<:Any, 2, Tuple{EmptyDim, <:Any}}, DimArray{<:Any, 1})

[Dev Question] What is the purpose/use of referenced dimensions?

Hi,

this is something I haven't understood yet, what's the reason of keeping some dimensions as refdims?

This is a bit unintuitive:

julia> At = timeaverage(A, 1:maxyear)
DimensionalArray with dimensions:
 Longitude (type Lon): Float32[0.5, 1.5, …, 358.5, 359.5]
 Latitude (type Lat): Float32[-89.5, -88.5, …, 88.5, 89.5]
and referenced dimensions:
 Time: 2000-03-15T00:00:00

it is odd for me to see that this time dimension has a single timepoint as its value (the first one). The resulting array comes from averaging over time, so no timepoint makes sense to be printed. If any it would be the mean, but e.g. for time data this doesn't make much sense.

broadcasting over dimensions, as in xarray

In Python, the dimensions are also xarray just as the normal "dimensional arrays". This means that one can do normal operations on the dimensions as well, e.g. the cosine of the longitude still returns an xarray.

Here we cant do that:

julia> x
dimension Longitude
val: [0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5  …  350.5, 351.5, 352.5, 353.5, 354.5, 355.5, 356.5, 357.5, 358.5, 359.5]
grid: DimensionalData.UnknownGrid()
metadata: nothing
type: Lon{Array{Float64,1},DimensionalData.UnknownGrid,Nothing}

julia> cos.(x)
ERROR: MethodError: Cannot `convert` an object of type Float64 to an object of type Array{Float64,1}
Closest candidates are:
  convert(::Type{Array{T,N}}, ::FillArrays.Zeros{V,N,Axes} where Axes) where {T, V, N} at C:\Users\datse\.julia\packages\FillArrays\Aj0C4\src\FillArrays.jl:339
  convert(::Type{Array{T,N}}, ::FillArrays.Ones{V,N,Axes} where Axes) where {T, V, N} at C:\Users\datse\.julia\packages\FillArrays\Aj0C4\src\FillArrays.jl:339
  convert(::Type{Array{T,N}}, ::FillArrays.AbstractFill{V,N,Axes} where Axes) where {T, V, N} at C:\Users\datse\.julia\packages\FillArrays\Aj0C4\src\FillArrays.jl:331
  ...
Stacktrace:
 [1] setindex!(::Array{Array{Float64,1},1}, ::Float64, ::Int64) at .\array.jl:782
 [2] broadcasted(::Function, ::Lon{Array{Float64,1},DimensionalData.UnknownGrid,Nothing}) at .\abstractarray.jl:725      
 [3] top-level scope at none:0

julia> eltype(x)
Array{Float64,1}

I think the culptrit is that eltype(x) returns the vector instead of teh element type of the vector.

Shouldn't broadcasting over dimensions create a DimensionalArray with one dimension (the dimension we broadcast over) and as data the broadcast result?

I've solved #41 but I now realize it is super clunky to use, as I have to transform all the dimensions of my fields into DimensionalArrays...

clarify selectors on point and interval grids

Between currently selects points between two values. For a grid of intervals, it should select intervals between two values, not points.

The current behaviour leads to occasional extra rows or columns compared to similar packages in R.

Point grids should have an explicit PointGrid grid type to replace the AlignedGrid, which is essentially a point grid as it doesn't have cell sizes. RegularGrid and BoundedGrid will be assumed to contain intervals, and should be <: IntervalGrid

The Sampling trait can maybe be removed as the grid type will encode that information.

cellsize field that updates with reduction along a dimension

Cells need to have a size in all dimensions. When a method like mean or sum that reduce a dimension to size 1 is called, the cell size on that dimension should be updated to match - to span the whole range of the original dimension instead of one cell.

ie cellsize *= length(val(dim))

The starting point would remain as is currently - the start of the first cell.

This could be a field of the grid trait type, or not.

[Example] Best way to dispatch on dimensions?

I have written some functions that do some statistical analysis. This analysis depends on whether the given array has dimensions Lon, Lat or if it has dimension Time, or if it has all 3.

At the moment each function has a different name, but I am wondering, it would be really cool to be able to dispatch on dimensions. I am searching for examples to do that. I couldn't find any in this repo, but I found https://github.com/rafaqz/GeoData.jl/blob/master/src/plotrecipes.jl#L29 which does

function f(A::GeoArray{T,3,<:Tuple{<:Lat,<:Lon,D}}) where {T,D}

This is almost there, my remaining concern is that this somehow enforces the dimension ordering... Do you think there is any possible way to write dispatch so that e.g. I can do:

dispatch on something that has 2 dimensions, one Lon, one Lat, in any order,
dispatch on something that has 3 dimensions, one Lon, one Lat, one Time, in any order

etc.?

Further improve pretty printing of DimensionalArray

Although the PR #33 improved the situation, much is left to be desired:

The printing of the dimensions is not optimal, too many elements are shown and they are taking multiple lines, which messes the alignment. The array data take up several pages when printed, which is also not so helpful.

No matter how much I hacked at it I couldn't make it work cleanly. I don't know why it worked cleanly for DynamicalSystems.jl.

Anyway, here is what I propose to make the situation better, which I can easily put in a PR:

For any dimension by default show at most 4 elements: first, second, end-1 and end, and put ... in between.
For array data with dimension 3 or larger, only show the first 2D frame, i.e. A[:, :, 1, 1, 1, 1,...].

This will make the printing both useful as well as readable.

Design of label field for DimArray

Cf. #53 . I was about to do the label field (I don't know what is better: label or name?), but I"ve realized that we should discuss first whether this field should propagate.

Initially my plan was to simply print the DimArray as DimensionalArray, labelled $(label) with dim..... But my concern is, should the label field be propagated after operations? At first I though "sure". And my plan was to simply edit rebuild to propagate the field.

The issue of course is that one can just add two DimArrays with different labels. Then there it doesn't make sense to propagate either of the labels...

So, should we just keep it simple and not propagate the name at any operation? Or are there some specific operations that it makes sense to propagate the field?

Support non standard axes with selectors

Axes should propagate through dims2indices and sel2indices.

This will also facilitate bounds and similar methods using the array axes when no index is passed in. Then dim names can then be used as a completly no-cost abstraction to just name the dims.

Array interfaces for dims?

AbstractDimension could be <: AbstractArray (AbstractArray{T,N}, not just Vector). This gets a little weird when they hold Number.

They could also just define a bunch of array methods without being <: AbstractArray

it would mostly just reduce the use of val() but probably has other benefits

Document dims types and dispatch for devs

Error when trying to construct one-dimentional Array/vector

This works:

A = DimensionalArray(rand(10,10), (DimensionalData.Y(1:10),DimensionalData.X(11:20)))

But seems to not allow a one dimensional array (vector)? This does not work:

A = DimensionalArray(rand(10), DimensionalData.X(1:10))

Is there a different way to construct a one-dimensional array?

Replace rebuild with Setfield.jl

rebuild should use Setfield.jl, so only constructorof needs to be defined for new array types (or use the default) instead of needing a rebuild.

Except this assumes that field names will be the same in new implementations. Which is bad. There might not even be e.g. a refdims field.

Documentation of invariants

I've got some ideas about what the invariant properties of DimensionalArrays should be, but then run into cases where they aren't true. I'm then not sure whether my mental model is wrong, or if I've found a bug. It'd be good to get some consensus on this, and document these. Potentially these could also be enforced at runtime, potentially in constructors. Here's a few of the cases I can recall:

Size of dimensions

Should this always be true?

length.(dims(da)) == size(da) == size(data(da))

Broadcasting and dimension equality

Should dimensions which broadcasting is happening over have to be equal? I.e.:

a = DimensionalArray(1:3, X(1:3))
b = DimensionalArray(1:3, X(3:-1:1))

a .+ reverse(b)  # I think this should work
a .+ b  # But shouldn't this error?

Equality and dimension equality

# If this is true
dims(a::DimensionalArray) != dims(b::DimensionalArray)
# Shouldn't this be true?
a != b

Pretty printing for DimensionalArray should actually show the dimension and not display the type info

julia> A = DimensionalArray(rand(12,10), (Y(0.1:0.1:1.2),  X(10:10:100)))
12×10 DimensionalArray{Float64,2,Tuple{Y{LinRange{Float64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Float64},Nothing},X{LinRange{Float64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Float64},Nothing}},Tuple{},Array{Float64,2}}:
 0.427904   0.801173  0.351286  0.832175  …  0.718153  0.729153   0.711978   0.566471    
 0.242142   0.354636  0.411136  0.584902     0.596248  0.0434958  0.0405127  0.147382    
 0.0740458  0.143374  0.27313   0.472312     0.884181  0.221327   0.0616056  0.963136    
 0.609528   0.352044  0.974345  0.849571     0.372485  0.583348   0.114052   0.973613    
 0.767141   0.385713  0.707579  0.78367      0.132243  0.112606   0.395172   0.254106    
 0.539374   0.544161  0.328544  0.908103  …  0.825559  0.999397   0.479662   0.81922 
 0.580855   0.725136  0.901243  0.672919     0.275564  0.0573271  0.847266   0.428343    
 0.0718895  0.273833  0.829184  0.907588     0.361578  0.760598   0.0833132  0.677053    
 0.728823   0.790125  0.275615  0.911619     0.597558  0.136166   0.866568   0.910418    
 0.603924   0.938003  0.151649  0.689762     0.850684  0.419717   0.657339   0.338519
 0.619414   0.616231  0.344264  0.371445  …  0.666552  0.998278   0.896923   0.433772    
 0.792093   0.280937  0.328419  0.900613     0.531562  0.887763   0.124901   0.275559

The problem here is that its obscure to get the information of the numeric values of the dimensions from A. It is not shown anywhere, and doing this:

julia> dims(A)
(
Y: Y{LinRange{Float64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Float64},Nothing}
val: range(0.1, stop=1.2, length=12)
grid: RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Float64}(Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward}(DimensionalData.Forward(), DimensionalData.Forward(), DimensionalData.Forward()), Start(), UnknownSampling(), 0.09999999999999999)
metadata: nothing,
X: X{LinRange{Float64},RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Float64},Nothing}
val: range(10.0, stop=100.0, length=10)
grid: RegularGrid{Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward},Start,UnknownSampling,Float64}(Ordered{DimensionalData.Forward,DimensionalData.Forward,DimensionalData.Forward}(DimensionalData.Forward(), DimensionalData.Forward(), DimensionalData.Forward()), Start(), UnknownSampling(), 10.0)
metadata: nothing)

does give you the answer but not without accompanying it with a wall of text.

This wall of text is the complicated type information of DimensionalArray and AbstractDimension. I do not know how useful this is for a developer, but I can say for sure it is of no use for a front-end user (as it is impossible to even understand all this without being a developer).

Maybe the show implementations should hide these type parameters and also display the val: range(10.0, stop=100.0, length=10) for each dimension in a manner like

with dimensions
 X: range(10.0, stop=100.0, length=10)
 Y: range(0.1, stop=1.2, length=12)

like Python's xarray does it?

handle empty length 1 dimension from adjoint on Vector, and similar.

We need a dimension type that is just a place holder for empty length 1 dimension, to handle the 2d result of:

timespan = DateTime(2001):Month(1):DateTime(2001,12)
A = DimensionalArray(rand(12), (Time(timespan),)) 
A'

Performance with sparse arrays

I'm seeing performance drops of about three orders of magnitude when using sparse data in a DimensionalArray vs. just performing operations on the sparse array directly. This doesn't happen with dense arrays. I've included an example below for mean, but I see the same thing for other reductions like sum or maximum and even methods like copy.

# Setup
using SparseArrays
using DimensionalData
using DimensionalData: @dim, Forward
using Statistics

using BenchmarkTools

@dim Var "Variable"
@dim Obs "Observation"

sparsear = sprand(10000, 10000, .1)
sparsed = DimensionalArray(
    sparsear,
    (Var <| ["var$i" for i in 1:10000], Obs <| ["obs$i" for i in 1:10000])
)

# Jit warmups
mean(sparsear, dims=1)
mean(sparsear, dims=2)
mean(sparsed, dims=Var)
mean(sparsed, dims=Obs)

# Benchmarks
@benchmark mean($sparsear, dims=$1)
# BenchmarkTools.Trial: 
#   memory estimate:  78.20 KiB
#   allocs estimate:  2
#   --------------
#   minimum time:     7.317 ms (0.00% GC)
#   median time:      9.561 ms (0.00% GC)
#   mean time:        9.307 ms (0.11% GC)
#   maximum time:     15.023 ms (36.95% GC)
#   --------------
#   samples:          536
#   evals/sample:     1
@benchmark mean($sparsed, dims=$Var)
# BenchmarkTools.Trial: 
#   memory estimate:  305.24 MiB
#   allocs estimate:  18
#   --------------
#   minimum time:     4.467 s (2.40% GC)
#   median time:      4.873 s (2.13% GC)
#   mean time:        4.873 s (2.13% GC)
#   maximum time:     5.280 s (1.91% GC)
#   --------------
#   samples:          2
#   evals/sample:     1

This was run with DimensionalData v0.1.1 and Julia v1.2.0.

Convenience way to check if a dimension exists in `dims(s)`

Hi there, I've finally started using GeoData, things seems good so far. I have the following:

(which shows I have to work more on pretty printing, but that's for another PR). So as you can see Longitude is one of the dimensions. Of course, if one wants to be precise, they have to understand that "Longitude" is just a name. The actual dimension is Lon.

What I am currently missing is a simple way to check if a "dimension" (whatever this means) is in the dimensions of a given DimensionalArray. For example:

julia> "Longitude" ∈ dims(s)
false

julia> Lon ∈ dims(s)
false

I propose that we make a simple nice function hasdim(s, dim) in both of the above examples would return true. For the string case, it is obvious, one can use dim \in name.(dims(s)). But for the second case, I must admit I find it really hard... This is in general confusing for me:

julia> typeof(Lon)
UnionAll

julia> supertype(ans)
Type{T}

julia> Lon <: Dim
false

what is Lon???

Using DimensionalData with GeoMakie brings an ambiguity error on getindex

julia> scene = surface(xs, ys, zeros(size(xs)); color = cf, shading = false, show_axis = false)
Error showing value of type Scene:
ERROR: MethodError: getindex(::Array{UInt32,1}) is ambiguous. Candidates:
  getindex(A::AbstractArray, dims::DimensionalData.AbstractDimension{#s15,G,M} where M where G where #s15<:Number...) in DimensionalData at C:\Users\m300808\.julia\packages\DimensionalData\WBLon\src\dimension.jl:74
  getindex(a::AbstractArray, I::DimensionalData.Selector...) in DimensionalData at C:\Users\m300808\.julia\packages\DimensionalData\WBLon\src\selector.jl:152
  getindex(A::AbstractArray, dims::DimensionalData.AbstractDimension...) in DimensionalData at C:\Users\m300808\.julia\packages\DimensionalData\WBLon\src\dimension.jl:76
Possible fix, define
  getindex(::AbstractArray)
Stacktrace:
 [1] glGenBuffers(::Int64) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLAbstraction\GLExtendedFunctions.jl:110
 [2] glGenBuffers() at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLAbstraction\GLExtendedFunctions.jl:108
 [3] GLMakie.GLAbstraction.GLBuffer{Point{2,Float32}}(::Ptr{Point{2,Float32}}, ::Int64, ::UInt32, ::UInt32) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLAbstraction\GLBuffer.jl:9
 [4] #GLBuffer#32(::UInt32, ::UInt32, ::Type{GLMakie.GLAbstraction.GLBuffer}, ::Array{Point{2,Float32},1}) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLAbstraction\GLBuffer.jl:43
 [5] GLMakie.GLAbstraction.GLBuffer(::Array{Point{2,Float32},1}) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLAbstraction\GLBuffer.jl:43
 [6] GLMakie.GLAbstraction.NativeMesh{GLMesh2D}(::GLMesh2D) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLAbstraction\GLUtils.jl:219
 [7] gl_convert(::GLMesh2D) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLAbstraction\GLUniforms.jl:197
 [8] GLMakie.GLAbstraction.RenderObject(::Dict{Symbol,Any}, ::GLMakie.GLVisualize.GLVisualizeShader, ::GLMakie.GLAbstraction.StandardPrerender, ::Nothing, ::Observables.Observable{GeometryTypes.HyperRectangle{3,Float32}}, ::Nothing) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLAbstraction\GLTypes.jl:336
 [9] assemble_robj(::Dict{Symbol,Any}, ::GLMakie.GLVisualize.GLVisualizeShader, ::Observables.Observable{GeometryTypes.HyperRectangle{3,Float32}}, ::UInt32, ::Nothing, ::Nothing) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLVisualize\utils.jl:38
 [10] assemble_shader(::Dict{Symbol,Any}) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLVisualize\utils.jl:65
 [11] visualize(::Any, ::GLMakie.GLAbstraction.Style{:surface}, ::Dict{Symbol,Any}) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\GLVisualize\visualize_interface.jl:21
 [12] (::GLMakie.var"#104#107"{Surface{...}})(::Dict{Symbol,Any}) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\drawing_primitives.jl:362
 [13] (::GLMakie.var"#58#64"{GLMakie.var"#104#107"{Surface{...}},GLMakie.Screen,Scene,Surface{...}})() at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\drawing_primitives.jl:57
 [14] get!(::GLMakie.var"#58#64"{GLMakie.var"#104#107"{Surface{...}},GLMakie.Screen,Scene,Surface{...}}, ::Dict{UInt64,GLMakie.GLAbstraction.RenderObject}, ::UInt64) at .\dict.jl:452
 [15] cached_robj!(::GLMakie.var"#104#107"{Surface{...}}, ::GLMakie.Screen, ::Scene, ::Surface{...}) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\drawing_primitives.jl:40
 [16] draw_atomic at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\drawing_primitives.jl:341 [inlined]
 [17] insert!(::GLMakie.Screen, ::Scene, ::Surface{...}) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\drawing_primitives.jl:126
 [18] insertplots!(::GLMakie.Screen, ::Scene) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\screen.jl:59
 [19] backend_display(::GLMakie.Screen, ::Scene) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\screen.jl:108
 [20] display(::AbstractPlotting.PlotDisplay, ::Scene) at C:\Users\m300808\.julia\packages\GLMakie\tywRm\src\gl_backend.jl:58
 [21] display(::Any) at .\multimedia.jl:323
 [22] #invokelatest#1 at .\essentials.jl:709 [inlined]
 [23] invokelatest at .\essentials.jl:708 [inlined]
 [24] print_response(::IO, ::Any, ::Bool, ::Bool, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\REPL\src\REPL.jl:156
 [25] print_response(::REPL.AbstractREPL, ::Any, ::Bool, ::Bool) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\REPL\src\REPL.jl:141
 [26] (::REPL.var"#do_respond#38"{Bool,REPL.var"#48#57"{REPL.LineEditREPL,REPL.REPLHistoryProvider},REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\REPL\src\REPL.jl:719
 [27] #invokelatest#1 at .\essentials.jl:709 [inlined]
 [28] invokelatest at .\essentials.jl:708 [inlined]
 [29] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\REPL\src\LineEdit.jl:2306
 [30] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\REPL\src\REPL.jl:1045
 [31] run_repl(::REPL.AbstractREPL, ::Any) at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.3\REPL\src\REPL.jl:201
 [32] (::Base.var"#770#772"{Bool,Bool,Bool,Bool})(::Module) at .\client.jl:382
 [33] #invokelatest#1 at .\essentials.jl:709 [inlined]
 [34] invokelatest at .\essentials.jl:708 [inlined]
 [35] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at .\client.jl:366
 [36] exec_options(::Base.JLOptions) at .\client.jl:304
 [37] _start() at .\client.jl:460

after doing:

# ALL of these packages have to be on master branch
using AbstractPlotting, Makie, GeoMakie, Proj4, DimensionalData

# %% Once with a dummy, generated field
lats = -89.5:1:89.5
lons = 0.5:1:359.5
field = [exp(cosd(l)) + 3(y/90) for l in lons, y in lats]
cf = circshift(field, 180) # shift the field to the correct position
source = Projection("+proj=lonlat +lon_0=180 +pm=180")
dest = Projection("+proj=moll +lon_0=0")
xs, ys = xygrid(lons, lats)
Proj4.transform!(source, dest, vec(xs), vec(ys))
scene = surface(xs, ys, zeros(size(xs)); color = cf, shading = false, show_axis = false)
geoaxis!(scene, -180, 180, -90, 90; crs = (src = source, dest = dest,))
coastlines!(scene, 1; crs = (src = source, dest = dest,))

cf https://github.com/JuliaPlots/GeoMakie.jl/issues/30

samples vs means / implementing a histogram

Motivation

Usually I have data, that is localized in some dimensions and represents averages in other dimensions. For instance there could be a time axis and a position axis and each value in the data matrix corresponds to a precise timepoint, but is an average over a space region.
I think DimensionalData deals very well with the former very well (data are point values), but for me it seems to not work so well for the latter (data are bin averages).

Lets reduce this example a bit to the case of representaing a 1d histogram. The histogram has n bins, that means n data values and n+1 walls that represent the bin boundary.

One operation that I would like to do frequently is to decide for an arbitrary position to which bin it belongs. This will require doing something like searchsorted(walls, x). So we need a Dimension that remembers the n+1 walls.

As far as I can see, there is no such thing build in?

Minimal implementation

using DimensionalData
using DimensionalData: X
import DimensionalData: sel2indices

struct Bins{W}
    walls::W # assumed to be sorted
end
struct Interval{T}
    left::T
    right::T
end
Base.eltype(::Type{Bins{W}}) where {W} = Interval{eltype{W}}
Base.length(w::Bins) = length(w.walls) - 1
Base.getindex(w::Bins, i::Integer) = Interval(w.walls[i], w.walls[i+1])

struct BinAround{T} <: Selector{T} # shorter name BinOf? Or even pack this into At?
    val::T
end

function sel2indices(grid, dim, sel::BinAround)
    bins = dim.val::Bins
    # TODO more correct edge cases,
    # also which bin to select in case val sits on a boundary?
    # Either BinAround or Bins needs to contain a closed left/right flag.
    index = searchsortedlast(bins.walls, sel.val)
end

arr = DimensionalArray(10:10:90, (X(Bins(1:10)),))
arr[BinAround(2.9)]

Questions

Am I correct, that this kind of functionality is currently not in DimensionalData.jl? Does it make sense to add it here? Is the above implementation strategy sane? If so I am happy to polish this and make a PR.

Add comparison to NamedArrays

Hello,
I am using the NamedArrays.jl package, and I would like to know how this package compares to it.
Best regards,