alexander-barth / ncdatasets.jl Goto Github PK

Load and create NetCDF files in Julia

License: MIT License

Julia 96.85% R 0.47% Python 0.41% Shell 0.06% TeX 2.20%

netcdf julia climate-and-forecast-conventions oceanography meteorology climatology opendap earth-observation

ncdatasets.jl's Introduction

NCDatasets

NCDatasets allows one to read and create netCDF files. NetCDF data set and attribute list behave like Julia dictionaries and variables like Julia arrays. This package implements the CommonDataModel.jl interface, which mean that the datasets can be accessed in the same way as GRIB files opened with GRIBDatasets.jl.

The module NCDatasets provides support for the following netCDF CF conventions:

_FillValue will be returned as missing (more information)
scale_factor and add_offset are applied if present
time variables (recognized by the units attribute) are returned as DateTime objects.
support of the CF calendars (standard, gregorian, proleptic gregorian, julian, all leap, no leap, 360 day)
the raw data can also be accessed (without the transformations above).
contiguous ragged array representation

Other features include:

Support for NetCDF 4 compression and variable-length arrays (i.e. arrays of vectors where each vector can have potentailly a different length)
The module also includes an utility function ncgen which generates the Julia code that would produce a netCDF file with the same metadata as a template netCDF file.

Installation

Inside the Julia shell, you can download and install the package by issuing:

using Pkg
Pkg.add("NCDatasets")

Manual

This Manual is a quick introduction in using NCDatasets.jl. For more details you can read the stable or latest documentation.

Create a netCDF file
Explore the content of a netCDF file
Load a netCDF file
Edit an existing netCDF file

Create a netCDF file

The following gives an example of how to create a netCDF file by defining dimensions, variables and attributes.

using NCDatasets
using DataStructures: OrderedDict
# This creates a new NetCDF file called file.nc.
# The mode "c" stands for creating a new file (clobber)
ds = NCDataset("file.nc","c")

# Define the dimension "lon" and "lat" with the size 100 and 110 resp.
defDim(ds,"lon",100)
defDim(ds,"lat",110)

# Define a global attribute
ds.attrib["title"] = "this is a test file"

# Define the variables temperature with the attribute units
v = defVar(ds,"temperature",Float32,("lon","lat"), attrib = OrderedDict(
    "units" => "degree Celsius",
    "scale_factor" => 10,
))

# add additional attributes
v.attrib["comments"] = "this is a string attribute with Unicode Ω ∈ ∑ ∫ f(x) dx"

# Generate some example data
data = [Float32(i+j) for i = 1:100, j = 1:110];

# write a single column
v[:,1] = data[:,1];

# write a the complete data set
v[:,:] = data;

close(ds)

It is also possible to create the dimensions, the define the variable and set its value with a single call to defVar:

using NCDatasets
ds = NCDataset("/tmp/test2.nc","c")
data = [Float32(i+j) for i = 1:100, j = 1:110]
v = defVar(ds,"temperature",data,("lon","lat"))
close(ds)

Explore the content of a netCDF file

Before reading the data from a netCDF file, it is often useful to explore the list of variables and attributes defined in it.

For interactive use, the following commands (without ending semicolon) display the content of the file similarly to ncdump -h file.nc:

using NCDatasets
ds = NCDataset("file.nc")

This creates the central structure of NCDatasets.jl, NCDataset, which represents the contents of the netCDF file (without immediatelly loading everything in memory). NCDataset is an alias for Dataset.

The following displays the information just for the variable varname:

ds["varname"]

while to get the global attributes you can do:

ds.attrib

NCDataset("file.nc") produces a listing like:

Dataset: file.nc
Group: /

Dimensions
   lon = 100
   lat = 110

Variables
  temperature   (100 × 110)
    Datatype:    Float32 (Float32)
    Dimensions:  lon × lat
    Attributes:
     units                = degree Celsius
     scale_factor         = 10
     comments             = this is a string attribute with Unicode Ω ∈ ∑ ∫ f(x) dx

Global attributes
  title                = this is a test file

Load a netCDF file

Loading a variable with known structure can be achieved by accessing the variables and attributes directly by their name.

# The mode "r" stands for read-only. The mode "r" is the default mode and the parameter can be omitted.
ds = NCDataset("file.nc","r")
v = ds["temperature"]

# load a subset
subdata = v[10:30,30:5:end]

# load all data
data = v[:,:]

# load all data ignoring attributes like scale_factor, add_offset, _FillValue and time units
data2 = v.var[:,:];


# load an attribute
unit = v.attrib["units"]
close(ds)

In the example above, the subset can also be loaded with:

subdata = NCDataset("file.nc")["temperature"][10:30,30:5:end]

This might be useful in an interactive session. However, the file test.nc is not directly closed (closing the file will be triggered by Julia's garbage collector), which can be a problem if you open many files. On Linux the number of opened files is often limited to 1024 (soft limit). If you write to a file, you should also always close the file to make sure that the data is properly written to the disk.

An alternative way to ensure the file has been closed is to use a do block: the file will be closed automatically when leaving the block.

data = NCDataset(filename,"r") do ds
    ds["temperature"][:,:]
end # ds is closed

Edit an existing netCDF file

When you need to modify variables or attributes in a netCDF file, you have to open it with the "a" option. Here, for example, we add a global attribute creator to the file created in the previous step.

ds = NCDataset("file.nc","a")
ds.attrib["creator"] = "your name"
close(ds);

Benchmark

The benchmark loads a variable of the size 1000x500x100 in slices of 1000x500 (applying the scaling of the CF conventions) and computes the maximum of each slice and the average of each maximum over all slices. This operation is repeated 100 times. The code is available at https://github.com/Alexander-Barth/NCDatasets.jl/tree/master/test/perf .

Module	median	minimum	mean	std. dev.
R-ncdf4	0.407	0.384	0.407	0.010
python-netCDF4	0.475	0.463	0.476	0.010
julia-NCDatasets	0.265	0.249	0.267	0.011

All runtimes are in seconds. We use Julia 1.10.0 (with NCDatasets 0.14.0), R 4.1.2 (with ncdf4 1.22) and Python 3.10.12 (with netCDF4 1.6.5) on a i5-1135G7 CPU and NVMe SSD (WDC WDS100T2B0C).

Filing an issue

When you file an issue, please include sufficient information that would allow somebody else to reproduce the issue, in particular:

Provide the code that generates the issue.
If necessary to run your code, provide the used netCDF file(s).
Make your code and netCDF file(s) as simple as possible (while still showing the error and being runnable). A big thank you for the 5-star-premium-gold users who do not forget this point! 👍🏅🏆
The full error message that you are seeing (in particular file names and line numbers of the stack-trace).
Which version of Julia and NCDatasets are you using? Please include the output of:

versioninfo()
using Pkg
Pkg.installed()["NCDatasets"]

Does NCDatasets pass its test suite? Please include the output of:

using Pkg
Pkg.test("NCDatasets")

Alternative

The package NetCDF.jl from Fabian Gans and contributors is an alternative to this package which supports a more Matlab/Octave-like interface for reading and writing NetCDF files.

Credits

netcdf_c.jl and the error handling code of the NetCDF C API are from NetCDF.jl by Fabian Gans (Max-Planck-Institut für Biogeochemie, Jena, Germany) released under the MIT license.

ncdatasets.jl's People

Contributors

Stargazers

Watchers

ncdatasets.jl's Issues

Make `size` of a NCDataset/CFVariable return a NamedTuple instead

I think it would be super-useful to have size of some CFVariable return a named tuple instead of just a tuple, like so:

Current

julia> fld
toa_sw_all_mon (360 × 180 × 228)
  Datatype:    Float32
  Dimensions:  lon × lat × time
  Attributes:
   long_name            = Top of The Atmosphere Shortwave Flux, All-Sky conditions, Monthly Means
   standard_name        = TOA Shortwave Flux - All-Sky
   CF_name              = toa_outgoing_shortwave_flux
   comment              = none
   units                = W m-2
   valid_min            =       0.00000
   valid_max            =       600.000
   _FillValue           = -999.0

julia> size(fld)
(360, 180, 228)

proposed:

julia> size(fld)
(lon = 360, lat = 180, time = 228)

this is non-breaking, as the result can still be accessed with integer indices. But it can be also accessed with name, which is super intuitive and gets rid of the problems "which dimension is which".

More constructors for CFDateTime types

The DateTime constructors that use DateFormat are not implemented for the CFDateTime types. For example,

julia> DateTime("19991205", "yyyymmdd")
1999-12-05T00:00:00

julia> DateTimeNoLeap("19991205", "yyyymmdd")
ERROR: MethodError: no method matching Int64(::String)
Closest candidates are:
  Int64(::Union{Bool, Int32, Int64, UInt32, UInt64, UInt8, Int128, Int16, Int8, UInt128, UInt16}) at boot.jl:717
  Int64(::Ptr) at boot.jl:727
  Int64(::Float32) at float.jl:697
  ...
Stacktrace:
 [1] DateTimeNoLeap(::String, ::String, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64) at /Users/portmann/.julia/packages/NCDatasets/O31tb/src/time.jl:317 (repeats 2 times)
 [2] top-level scope at none:0

It would be handy to have these.

PS Great job on NCDatasets, I find it very useful!

Why is CFVariable a subtype of `AbstractArray`?

(since I want to contribute here regularly, I need to understand some design decisions)

Problems like e.g. #48 arise because CFVariable is a subtype of AbstractArray, even though it in fact does not represent a contiguous blob of numbers that are stored in-memory. If I understood it properly, a CFVariable represents data on disk that are not yet loaded on memory, correct?

What parts of NCDatasets uses the fact that CFVariable is a subtype of AbstractArray?

_FillValue not supported for Float32 type

Sorry Alexander for opening a new issue, but the "_FillValue" attribute is not possible for Float32 type. Perhaps it is not supposed to be possible?

Read a variable of Char Datatype

How can I read a variable made of Chars?

I am trying to read a NetCDF file from GEOTRACES that contains a variable for cruise names. This variable seems to be encoded as an array of Chars, but I don't know how to acces it. After reading the NetCDF file via ds = Dataset("...") (I think the file is too big to be put in a MWE here), I get:

julia> ds["metavar1"]
metavar1 (6 × 1866)
  Datatype:    Char
  Dimensions:  STRING6 × N_STATIONS
  Attributes:
   long_name            = Cruise
   _FillValue           =

julia> size(ds["metavar1"])
(6, 1866)

julia> ds["metavar1"][:,:]
ERROR: MethodError: no method matching isnan(::String)
Closest candidates are:
  isnan(::BigFloat) at mpfr.jl:886
  isnan(::Missing) at missing.jl:79
  isnan(::Float16) at float.jl:530
  ...
Stacktrace:
 [1] getindex(::NCDatasets.CFVariable{Union{Missing, Char},2,NCDatasets.Variable{Char,2},NCDatasets.Attributes}, ::Colon, ::Colon) at /Users/benoitpasquier/.julia/packages/NCDatasets/uW2kc/src/NCDatasets.jl:1304
 [2] top-level scope at REPL[187]:1

I can overload isnan with Base.isnan(::String) = false for this to works but it feels wrong... Maybe it's possible/better to dispatch NCDataset's getindex to another method if the element type is String or Char?

Attempt to use feature that was not turned on when netCDF was built

Describe the bug

NCDatasets.NetCDFError(-128, "NetCDF: Attempt to use feature that was not turned on when netCDF was built.").

One a machine on which netCDF library was not previously installed.

To Reproduce

Dataset("./testfile01.nc", "c") do ds

    # Dimensions
    ds.dim["lon"] = 10
    ds.dim["lat"] = 20
    
    # Declare variables
    nclon = defVar(ds,"lon", Float64, ("lon",))
    nclon.attrib["standard_name"] = "longitude"

    nclat = defVar(ds,"lat", Float64, ("lat",))
    nclat.attrib["standard_name"] = "latitude"

end

Environment

operating system: Windows
Julia version: julia 1.0.3

Full output

Obtained from Pkg.test("NCDatasets")

NetCDF library: C:\Program Files\netCDF 4.6.2\bin\netcdf.DLL
NetCDF version: 4.6.2 of Nov 16 2018 15:52:48 $
NCDatasets: Error During Test at C:\Users\leroyd\.julia\packages\NCDatasets\P5zq1\test\runtests.jl:15
  Got exception outside of a @test
  NCDatasets.NetCDFError(-128, "NetCDF: Attempt to use feature that was not turned on when netCDF was built.")
  Stacktrace:
   [1] check at C:\Users\leroyd\.julia\packages\NCDatasets\P5zq1\src\NCDatasets.jl:52 [inlined]
   [2] nc_create(::String, ::UInt16) at C:\Users\leroyd\.julia\packages\NCDatasets\P5zq1\src\netcdf_c.jl:257
   [3] #Dataset#19(::Symbol, ::Array{Any,1}, ::Type, ::String, ::String) at C:\Users\leroyd\.julia\packages\NCDatasets\P5zq1\src\NCDatasets.jl:502
   [4] Dataset(::String, ::String) at C:\Users\leroyd\.julia\packages\NCDatasets\P5zq1\src\NCDatasets.jl:480
   [5] #Dataset#20(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Type, ::getfield(Main, Symbol("##3#11")), ::String, ::Vararg{String,N} where N) at C:\Users\leroyd\.julia\packages\NCDatasets\P5zq1\src\NCDatasets.jl:527
   [6] Dataset(::Function, ::String, ::Vararg{String,N} where N) at C:\Users\leroyd\.julia\packages\NCDatasets\P5zq1\src\NCDatasets.jl:527
   [7] macro expansion at C:\Users\leroyd\.julia\packages\NCDatasets\P5zq1\test\runtests.jl:22 [inlined]
   [8] macro expansion at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Test\src\Test.jl:1083 [inlined]
   [9] top-level scope at C:\Users\leroyd\.julia\packages\NCDatasets\P5zq1\test\runtests.jl:16
   [10] include at .\boot.jl:317 [inlined]
   [11] include_relative(::Module, ::String) at .\loading.jl:1044
   [12] include(::Module, ::String) at .\sysimg.jl:29
   [13] include(::String) at .\client.jl:392
   [14] top-level scope at none:0
   [15] eval(::Module, ::Any) at .\boot.jl:319
   [16] exec_options(::Base.JLOptions) at .\client.jl:243
   [17] _start() at .\client.jl:425
Test Summary: | Error  Total
NCDatasets    |     1      1

An array of arbitrary size can be appended to a CFVariable

Describe the bug

An array of arbitrary size can be appended to a CFVariable. Is this intended behaviour? Would it be possible to check the sizes of arrays being appended against the size of CFVariable and produce an error if they don't match?

To Reproduce

using NCDatasets

ds = Dataset("temp.nc","c")
x = collect(1:10)
defVar(ds, "x", x, ("x",))
defDim(ds, "Time", Inf)
sync(ds)
defVar(ds, "Time", Float64, ("Time",))

defVar(ds, "a", Float64, ("x", "Time"))
defVar(ds, "u", Float64, ("x", "Time"))
defVar(ds, "v", Float64, ("x", "Time"))
defVar(ds, "w", Float64, ("x", "Time"))
for i in 1:10
    ds["Time"][i] = i
    ds["a"][:,i] = 1                              # Adds all 1s
    ds["u"][:,i] = collect(1:9)                   # The last element is 0 or an arbitrary small number
    ds["v"][:,i] = collect(1:11)                  # The 11th element is dropped
    ds["w"][:,i] = reshape(collect(1:20), 10, 2)  # This behaves like the previous case
end

close(ds)

Expected behavior

All the above cases should produce size or dimension mismatch errors.

Environment

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: ---------------------------------------------
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Cannot convert time to DateTime

Hello!

I was playing with the time vector when I realized that your package converts to a DateTime type. Hence, I'm thinking that I could simply leverage your package for this task.

However, using multiples CORDEX and CMIP5 nc files, I can't convert the time variable to DateTime and get errors. I've added one of the file that are causing problem. I can provide more files. Thanks for any hints! 👍

https://www.dropbox.com/sh/1gjfdjpb0rtwq2w/AAAn19nXqu5wgeXGoxY97MS2a?dl=0

ds=Dataset(cmip5file2)
Dataset: /path/to/file/tasmax_day_IPSL-CM5A-MR_historical_r2i1p1_20000101-20051231.nc
Group: /

Dimensions
   time = 2190
   lat = 143
   lon = 144
   bnds = 2

Variables
  time   (2190)
    Datatype:    Float64
    Dimensions:  time
    Attributes:
     bounds               = time_bnds
     units                = days since 1850-01-01 00:00:00
     calendar             = noleap
     axis                 = T
     long_name            = time
     standard_name        = time

  time_bnds   (2 × 2190)
    Datatype:    Float64
    Dimensions:  bnds × time

  lat   (143)
    Datatype:    Float64
    Dimensions:  lat
    Attributes:
     bounds               = lat_bnds
     units                = degrees_north
     axis                 = Y
     long_name            = latitude
     standard_name        = latitude

  lat_bnds   (2 × 143)
    Datatype:    Float64
    Dimensions:  bnds × lat

  lon   (144)
    Datatype:    Float64
    Dimensions:  lon
    Attributes:
     bounds               = lon_bnds
     units                = degrees_east
     axis                 = X
     long_name            = longitude
     standard_name        = longitude

  lon_bnds   (2 × 144)
    Datatype:    Float64
    Dimensions:  bnds × lon

  height  
    Attributes:
     units                = m
     axis                 = Z
     positive             = up
     long_name            = height
     standard_name        = height

  tasmax   (144 × 143 × 2190)
    Datatype:    Float32
    Dimensions:  lon × lat × time
    Attributes:
     standard_name        = air_temperature
     long_name            = Daily Maximum Near-Surface Air Temperature
     units                = K
     original_name        = t2m_max
     cell_methods         = time: maximum (interval: 30 minutes)
     cell_measures        = area: areacella
     history              = 2012-04-25T17:00:57Z altered by CMOR: Treated scalar dimension: 'height'. 2012-04-25T17:00:57Z altered by CMOR: replaced missing value flag (9.96921e+36) with standard missing value (1e+20). 2012-04-25T17:00:58Z altered by CMOR: Inverted axis: lat.
     coordinates          = height
     missing_value        = 1.0e20
     _FillValue           = 1.0e20
     associated_files     = baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation gridspecFile: gridspec_atmos_fx_IPSL-CM5A-MR_historical_r0i0p0.nc areacella: areacella_fx_IPSL-CM5A-MR_historical_r0i0p0.nc

Global attributes
  institution          = IPSL (Institut Pierre Simon Laplace, Paris, France)
  institute_id         = IPSL
  experiment_id        = historical
  source               = IPSL-CM5A-MR (2010) : atmos : LMDZ4 (LMDZ4_v5, 144x143x39); ocean : ORCA2 (NEMOV2_3, 2x2L31); seaIce : LIM2 (NEMOV2_3); ocnBgchem : PISCES (NEMOV2_3); land : ORCHIDEE (orchidee_1_9_4_AR5)
  model_id             = IPSL-CM5A-MR
  forcing              = Nat,Ant,GHG,SA,Oz,LU,SS,Ds,BC,MD,OC,AA
  parent_experiment_id = piControl
  parent_experiment_rip = r1i1p1
  branch_time          = 1860.0
  contact              = ipsl-cmip5 _at_ ipsl.jussieu.fr Data manager : Sebastien Denvil
  comment              = This 20th century simulation include natural and anthropogenic forcings.
  references           = Model documentation and further reference available here : http://icmc.ipsl.fr
  initialization_method = 1
  physics_version      = 1
  tracking_id          = 865ced25-4e24-43d1-bf07-fd163ba5c7b7
  product              = output
  experiment           = historical
  frequency            = day
  creation_date        = 2012-04-25T17:00:58Z
  history              = 2012-04-25T17:00:58Z CMOR rewrote data to comply with CF standards and CMIP5 requirements.
  Conventions          = CF-1.4
  project_id           = CMIP5
  table_id             = Table day (10 February 2011) 80e409bd73611e9d25d049ad2059c310
  title                = IPSL-CM5A-MR model output prepared for CMIP5 historical
  parent_experiment    = pre-industrial control
  modeling_realm       = atmos
  realization          = 2
  cmor_version         = 2.7.1


julia> ds["time"]
time (2190)
  Datatype:    Float64
  Dimensions:  time
  Attributes:
   bounds               = time_bnds
   units                = days since 1850-01-01 00:00:00
   calendar             = noleap
   axis                 = T
   long_name            = time
   standard_name        = time

julia> ds["time"][:]
ERROR: MethodError: no method matching start(::Void)
Closest candidates are:
  start(::SimpleVector) at essentials.jl:258
  start(::Base.MethodList) at reflection.jl:560
  start(::ExponentialBackOff) at error.jl:107
  ...
Stacktrace:
 [1] timedecode(::Array{Float64,1}, ::String, ::String) at /home/proy/.julia/v0.6/NCDatasets/src/NCDatasets.jl:217
 [2] getindex(::NCDatasets.CFVariable{Float64,Float64,1}, ::Colon) at /home/proy/.julia/v0.6/NCDatasets/src/NCDatasets.jl:1091

edit - Should have added that all tests pass and I'm on latest master of NCDatasets.

julia> Pkg.test("NCDatasets")
INFO: Testing NCDatasets
NetCDF library: /home/proy/.julia/v0.6/Conda/deps/usr/lib/libnetcdf.so
NetCDF version: 4.5.0 of Nov  8 2017 13:35:26 $
Test Summary: | Pass  Total
NCDatasets    |  547    547
INFO: NCDatasets tests passed

julia> Pkg.status("NCDatasets")
 - NCDatasets                    0.0.10+            master

UndefVarError: ipermute!! not defined

When compiling against the latest nightly build of Julia, I get this:

julia> using NCDatasets
┌ Warning: Deprecated syntax `type` at /home/mhu027/.julia/v0.7/NCDatasets/src/NCDatasets.jl:14.
│ Use `mutable struct` instead.
└ @ nothing NCDatasets.jl:14
┌ Warning: Deprecated syntax `type` at /home/mhu027/.julia/v0.7/NCDatasets/src/NCDatasets.jl:61.
│ Use `mutable struct` instead.
└ @ nothing NCDatasets.jl:61
[...]
┌ Warning: Deprecated syntax `parametric method syntax Base.convert{S, T, N}(::Type{DataArray{S, N}}, x::DataArray{T, N})` around /home/mhu027/.julia/v0.7/DataArrays/src/dataarray.jl:348.
│ Use `Base.convert(#unused#::Type{DataArray{S, N}}, x::DataArray{T, N}) where {S, T, N}` instead.
└ @ nothing dataarray.jl:348
WARNING: importing deprecated binding Base.Range into DataArrays.
WARNING: Base.Range is deprecated, use AbstractRange instead.
  likely near /home/mhu027/.julia/v0.7/DataArrays/src/pooleddataarray.jl:181
WARNING: Base.Range is deprecated, use AbstractRange instead.
  likely near /home/mhu027/.julia/v0.7/DataArrays/src/pooleddataarray.jl:181
┌ Warning: Deprecated syntax `parametric method syntax pdatazeros{R <: Integer}(t::Type, r::Type{R}, dims::Int...)` around /home/mhu027/.julia/v0.7/DataArrays/src/pooleddataarray.jl:189.
│ Use `pdatazeros(t::Type, r::Type{R}, dims::Int...) where R <: Integer` instead.
└ @ nothing pooleddataarray.jl:189
┌ Warning: Deprecated syntax `parametric method syntax pdataones{R <: Integer}(t::Type, r::Type{R}, dims::Int...)` around /home/mhu027/.julia/v0.7/DataArrays/src/pooleddataarray.jl:189.
│ Use `pdataones(t::Type, r::Type{R}, dims::Int...) where R <: Integer` instead.
└ @ nothing pooleddataarray.jl:189
ERROR: LoadError: LoadError: UndefVarError: ipermute!! not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./sysimg.jl:14
 [2] top-level scope
 [3] include at ./boot.jl:292 [inlined]
 [4] include_relative(::Module, ::String) at ./loading.jl:1012
 [5] include at ./sysimg.jl:26 [inlined]
 [6] include(::String) at /home/mhu027/.julia/v0.7/DataArrays/src/DataArrays.jl:3
 [7] top-level scope
 [8] include at ./boot.jl:292 [inlined]
 [9] include_relative(::Module, ::String) at ./loading.jl:1012
 [10] include(::Module, ::String) at ./sysimg.jl:26
 [11] top-level scope
 [12] eval at ./boot.jl:295 [inlined]
 [13] top-level scope at ./<missing>:3
in expression starting at /home/mhu027/.julia/v0.7/DataArrays/src/pooleddataarray.jl:544
in expression starting at /home/mhu027/.julia/v0.7/DataArrays/src/DataArrays.jl:47
ERROR: LoadError: Failed to precompile DataArrays to /home/mhu027/.julia/lib/v0.7/DataArrays.ji.
Stacktrace:
 [1] error at ./error.jl:33 [inlined]
 [2] compilecache(::Base.PkgId) at ./loading.jl:1157
 [3] _require(::Base.PkgId) at ./loading.jl:920
 [4] require(::Module, ::Symbol) at ./loading.jl:820
 [5] include at ./boot.jl:292 [inlined]
 [6] include_relative(::Module, ::String) at ./loading.jl:1012
 [7] include(::Module, ::String) at ./sysimg.jl:26
 [8] top-level scope
 [9] eval at ./boot.jl:295 [inlined]
 [10] top-level scope at ./<missing>:3
in expression starting at /home/mhu027/.julia/v0.7/NCDatasets/src/NCDatasets.jl:7
ERROR: Failed to precompile NCDatasets to /home/mhu027/.julia/lib/v0.7/NCDatasets.ji.
Stacktrace:
 [1] error at ./error.jl:33 [inlined]
 [2] compilecache(::Base.PkgId) at ./loading.jl:1157
 [3] _require(::Base.PkgId) at ./loading.jl:949
 [4] require(::Module, ::Symbol) at ./loading.jl:820

Version information:

julia> versioninfo()
Julia Version 0.7.0-DEV.3686
Commit c6f056b79a (2018-02-01 23:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
Environment:

julia> Pkg.installed()["NCDatasets"]
v"0.0.7"

Silent inconsistency for dayofyear for non-standard calendar

Hello!

Stumbled into some inconsistency with the non-standard calendar in terms of dayofyear.

MWE

using Dates, NCDatasets
julia> daysinyear(DateTimeNoLeap(2008))
365

julia> dayofyear(DateTimeNoLeap(2008, 12, 31))
366

As far as I understand, the method dayofyear is implemented in Dates only and reexported in NCDatasets.

Cheers!

NetCDF: Operation not allowed in define mode

Hi, thank you for this awesome package. We are using it in Oceananigans.jl.

Describe the bug

I get the error in the title when I try to define a variable. I think it happens when I define a dimension after defining a new variable.

To Reproduce

using NCDatasets
x, y = collect(1:10), collect(10:18)

# This works
Dataset("temp.nc", "c") do ds
     defDim(ds, "x", length(x))
     defDim(ds, "y", length(y))
     defVar(ds, "x", x, ("x",))
     defVar(ds, "y", y, ("y",))
end

# This fails (Please see the error at the end of the post)
Dataset("temp1.nc", "c") do ds
      defDim(ds, "x", length(x))
      defVar(ds, "x", x, ("x",))
      defDim(ds, "y", length(y))
      defVar(ds, "y", y, ("y",))
end


# This again works!
Dataset("temp2.nc", "c") do ds
     defDim(ds, "x", length(x))
     defVar(ds, "x", x, ("x",))
     defDim(ds, "y", length(y)); sync(ds)
     defVar(ds, "yvar", y, ("y",))
end

Expected behavior

I imagine it should work in all the three cases above.

Environment

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: ---------------------------------------------
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Full output

ERROR: NCDatasets.NetCDFError(-39, "NetCDF: Operation not allowed in define mode")
Stacktrace:
 [1] check at /home/user/NCDatasets.jl/src/NCDatasets.jl:53 [inlined]
 [2] nc_redef at /home/user/NCDatasets.jl/src/netcdf_c.jl:909 [inlined]
 [3] defmode at /home/user/NCDatasets.jl/src/NCDatasets.jl:170 [inlined]
 [4] #defVar#21(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Dataset, ::String, ::DataType, ::Tuple{String}) at /home/user/Work/NCDatasets.jl/src/NCDatasets.jl:644
 [5] defVar at /home/user/NCDatasets.jl/src/NCDatasets.jl:642 [inlined]
 [6] #defVar#24(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Dataset, ::String, ::Array{Int64,1}, ::Tuple{String}) at /home/user/NCDatasets.jl/src/NCDatasets.jl:724
 [7] defVar(::Dataset, ::String, ::Array{Int64,1}, ::Tuple{String}) at /home/user/NCDatasets.jl/src/NCDatasets.jl:698
 [8] (::getfield(Main, Symbol("##7#8")))(::Dataset) at ./REPL[5]:5
 [9] #Dataset#20(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Type, ::getfield(Main, Symbol("##7#8")), ::String, ::Vararg{String,N} where N) at /home/user/NCDatasets.jl/src/NCDatasets.jl:553
 [10] Dataset(::Function, ::String, ::Vararg{String,N} where N) at /home/user/NCDatasets.jl/src/NCDatasets.jl:551
 [11] top-level scope at none:0

Use Yggdrasil as a provider for the NetCDF library instead of Conda

We have to wait for JuliaPackaging/Yggdrasil#329 and references therein to be resolved first.

(thankfully this is non-breaking)

CFVariable type not accepted in zeros()

Before, in Julia 0.6.2, I could use zeros() on a NCDatasets.CFVariable{Float64,Float64,2} type as follows:

julia> typeof(mydata)
NCDatasets.CFVariable{Float64,Float64,2}

julia> zeros(mydata)
17×31 Array{Float64,2}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
...

but now on 1.0.0 that type is not accepted:

julia> zeros(mydata)
ERROR: MethodError: no method matching zeros(::NCDatasets.CFVariable{Float64,Float64,2})
Closest candidates are:
  zeros(::Union{Integer, AbstractUnitRange}...) at array.jl:463
  zeros(::Type{T}, ::Union{Integer, AbstractUnitRange}...) where T at array.jl:464
  zeros(::Tuple{Vararg{Union{Integer, AbstractUnitRange},N} where N}) at array.jl:465
  ...
Stacktrace:
 [1] top-level scope at none:0

Of course, this seems like a restriction on zeros(), and may not be a bug of this package. If so, could you advice a work-around?

I pulled the most recent version of NCDatasets.jl.

Implement "daysinmonth" and other methods for non-standard calendar

Hello!

I think it would be useful to extend some methods for non-standard calendar. For instance, the number of daysinmonth for a 365_day calendar in a leap year should return 28 and not 29. Similarly, for 360_day calendar, each month should return 30.

Here's the implementation in Julia Dates stdlib.
https://github.com/JuliaLang/julia/blob/ab25ae4df2605c1d1c0a8ca7aa503888412594b6/stdlib/Dates/src/types.jl#L146

I'm now heavily leveraging your package in ClimateTools. Thanks again for your hard-work!

edit - Last sentence had a strong typo, I wrote "I'm not heavily..." Oups!

Unexpected behavior indexing Variable with unlimited dimension with CartesianIndex and UnitRange

Ran across this edge case today. Let's say you have a Variable named var with 3 dimensions where the outermost dimension is unlimited. var[CartesianIndex(1, 1), 2:end] returns an empty vector, but var[1, 1, 2:end] does not. Also var[CartesianIndex(1, 1), 2:nt] where nt is the current length of the 3rd dimension behaves as expected. So there is something about the getindex method that gets called with a CartesianIndex and UnitRange with end that triggers this bug.

[DOC] How to get the dimensions of a dataset?

I have

julia> fld
toa_sw_all_mon (360 × 180 × 228)
  Datatype:    Float32
  Dimensions:  lon × lat × time
  Attributes:
   long_name            = Top of The Atmosphere Shortwave Flux, All-Sky conditions, Monthly Means     
   standard_name        = TOA Shortwave Flux - All-Sky
   CF_name              = toa_outgoing_shortwave_flux
   comment              = none
   units                = W m-2
   valid_min            =       0.00000
   valid_max            =       600.000
   _FillValue           = -999.0

how do I programmatically get the sequence lon × lat × time, which are the dimensions of the dataset? I'd like to be able to get them both as the direct variables of the NCFile, but also as just a tuple of names (:lon, :lat, :time).

Asmmetry with Missings and DateTime when loading and saving vars.

It seems that Dataset vars arrays are automatically typed Union{Missing,T} and nodata values converted to missing. But at the same time you can't write an array containing missings in defVar.

Maybe defVars should handle vars containing missing automatically converting them to some standard missing value? Then read and write would have more symmetry, and we could just write a a var we just loaded back to netcdf. Otherwise special-case handling of Missing is required in the client code.

Unless I'm missing something.

Load/save of datetime is also not symmetrical. It's only converted on load. It would be great to handle conversion of types both ways.

Strings are inconsistent in v0.0.11

I do not have access to much info right now (away from "production" computer, but I'm getting some results from Travis CI) but since v0.0.11, some strings are not consistent. For example, units for temperature returns "K\0" instead of "K" for Kelvin units. Other strings seems to be affected. I could provide more info Monday if need be.

Cheers!

modify netCDF files

Great work producing netCDF library for "julia"!

Is it possible to add possibility to modify existing files? At the moment there are two modes:
'c' - create
'r' - read
It would be great to have an option 'a' similar with Python's netCDF4, which gives user an opportunity to modify the file.
Sorry for starting an "issue", whereas it is more an request for future release...

ERA5 time vector

Hello!

Just downloaded ERA5 datasets and their time definition is of the following form. Notice the .0 after seconds.

julia> ds["time"]
time (366)
  Datatype:    Int32
  Dimensions:  time
  Attributes:
   units                = hours since 1900-01-01 00:00:00.0

It throws an error when trying to extract the values

julia> timeV = ds["time"][:]
ERROR: ArgumentError: invalid base 10 digit '.' in "00.0"
Stacktrace:
 [1] tryparse_internal(::Type{Int64}, ::SubString{String}, ::Int64, ::Int64, ::Int64, ::Bool) at ./parse.jl:131
 [2] #parse#348(::Nothing, ::Function, ::Type{Int64}, ::SubString{String}) at ./parse.jl:238
 [3] parse at ./parse.jl:238 [inlined]
 [4] _broadcast_getindex_evalf at ./broadcast.jl:578 [inlined]
 [5] _broadcast_getindex at ./broadcast.jl:561 [inlined]
 [6] getindex at ./broadcast.jl:511 [inlined]
 [7] macro expansion at ./broadcast.jl:843 [inlined]
 [8] macro expansion at ./simdloop.jl:73 [inlined]
 [9] copyto! at ./broadcast.jl:842 [inlined]
 [10] copyto! at ./broadcast.jl:797 [inlined]
 [11] copy at ./broadcast.jl:773 [inlined]
 [12] materialize at ./broadcast.jl:753 [inlined]
 [13] parseDT(::Type{DateTimeStandard}, ::SubString{String}) at /home/proy/.julia/packages/NCDatasets/hWLMo/src/time.jl:453
 [14] timeunits(::Type{DateTimeStandard}, ::String) at /home/proy/.julia/packages/NCDatasets/hWLMo/src/time.jl:472
 [15] timedecode(::Type{DateTimeStandard}, ::Array{Int32,1}, ::String) at /home/proy/.julia/packages/NCDatasets/hWLMo/src/time.jl:530
 [16] #timedecode#2(::Bool, ::Function, ::Array{Int32,1}, ::String, ::String) at /home/proy/.julia/packages/NCDatasets/hWLMo/src/time.jl:561
 [17] timedecode(::Array{Int32,1}, ::String, ::String) at /home/proy/.julia/packages/NCDatasets/hWLMo/src/time.jl:560
 [18] getindex(::NCDatasets.CFVariable{Union{Missing, Float64},1,NCDatasets.Variable{Int32,1},NCDatasets.Attributes}, ::Colon) at /home/proy/.julia/packages/NCDatasets/hWLMo/src/NCDatasets.jl:1206
 [19] top-level scope at none:0

I can extract it withouth the conversion though

ds["time"].var[:]
366-element Array{Int32,1}:
 876576
 876600
 876624
 876648
 876672
 876696
 876720
 876744
 876768
 876792
      ⋮
 885144
 885168
 885192
 885216
 885240
 885264
 885288
 885312
 885336

Let me know if you need a sample, I can upload it to Dropbox I guess 9but they are close to 800MB in size).

Cheers!

See JuliaClimate/ClimateTools.jl#91

Index variables by dimension names

Sorry for the sudden burst of issues! This, perhaps thankfully, is a feature request.

At the moment, the output of a variable is a multidimensional array. It would be good to have an AxisArray (or similar) returned on calling a variable.

Current (NC Datasets 0.3.0+ commit 899297f ), using the same MWE as linked to in issue #4:

using NCDatasets; dataset = Dataset("test_netcdf.nc"); dataset["resource"]

48×9 NCDatasets.CFVariable{Float64,Float64,2}:
 0.0          -0.215376   -64.732   …  0.0          -0.0156    -18.7629
 0.0          -0.200838   -70.4534     0.0          -0.860322  -18.7629
 0.0          -0.207306   -77.193      0.0          -0.0156    -18.7629
 0.0          -0.318949  -104.556      0.0          -0.0156    -18.7629
 0.0          -0.650734  -123.228      0.0          -0.860407  -18.7629
 0.00785714   -1.03938   -167.669   …  0.00785714   -7.26333   -30.2124
 0.0291429    -1.18157   -264.887      0.0291429    -9.39823   -35.2333
 0.0555714    -1.2854    -365.138      0.0555714    -5.79284   -61.3953
 0.0784286    -1.20912   -258.173      0.0784286    -3.32259   -63.643 
 0.096        -1.21991   -190.586      0.096        -1.92726   -62.6797
 0.108143     -0.736295  -148.187   …  0.108143     -1.62162   -64.9274
 0.107571     -0.572356  -113.122      0.107571     -1.37952   -63.643 
 0.09         -0.529149  -109.166      0.09         -1.17358   -61.0742
 ⋮                                  ⋱

dimensions in NetCDF files could be anything, and it would be good to be able to index the variables over the dimension indices. To do so, I'd have to do something like this at the moment:

d = AxisArray(dataset["resource"], Axis{:time}([Dates.DateTime(i) for i in dataset["time"]]), Axis{:loc_techs}([i for i in dataset["loc_techs_finite_resource"]]))

2-dimensional AxisArray{Float64,2,...} with axes:
    :time, DateTime[2005-07-01T00:00:00, 2005-07-01T01:00:00, 2005-07-01T02:00:00, 2005-07-01T03:00:00, 2005-07-01T04:00:00, 2005-07-01T05:00:00, 2005-07-01T06:00:00, 2005-07-01T07:00:00, 2005-07-01T08:00:00, 2005-07-01T09:00:00  …  2005-07-02T14:00:00, 2005-07-02T15:00:00, 2005-07-02T16:00:00, 2005-07-02T17:00:00, 2005-07-02T18:00:00, 2005-07-02T19:00:00, 2005-07-02T20:00:00, 2005-07-02T21:00:00, 2005-07-02T22:00:00, 2005-07-02T23:00:00]
    :loc_techs, String["X2:pv", "X1:demand_heat", "X2:demand_heat", "X3:pv", "X2:demand_power", "X1:demand_power", "X1:pv", "X3:demand_heat", "X3:demand_power"]
And data, a 48×9 NCDatasets.CFVariable{Float64,Float64,2}:
 0.0          -0.215376   -64.732   …  0.0          -0.0156    -18.7629
 0.0          -0.200838   -70.4534     0.0          -0.860322  -18.7629
 0.0          -0.207306   -77.193      0.0          -0.0156    -18.7629
 0.0          -0.318949  -104.556      0.0          -0.0156    -18.7629
 0.0          -0.650734  -123.228      0.0          -0.860407  -18.7629
 0.00785714   -1.03938   -167.669   …  0.00785714   -7.26333   -30.2124
 0.0291429    -1.18157   -264.887      0.0291429    -9.39823   -35.2333
 0.0555714    -1.2854    -365.138      0.0555714    -5.79284   -61.3953
 0.0784286    -1.20912   -258.173      0.0784286    -3.32259   -63.643 
 0.096        -1.21991   -190.586      0.096        -1.92726   -62.6797
 0.108143     -0.736295  -148.187   …  0.108143     -1.62162   -64.9274
 0.107571     -0.572356  -113.122      0.107571     -1.37952   -63.643 
 0.09         -0.529149  -109.166      0.09         -1.17358   -61.0742
 ⋮                                  ⋱

Which leads to the desired output, as now I can index the array as:

d[Axis{:time}(DateTime("2005-07-01T01:00:00"))]

1-dimensional AxisArray{Float64,1,...} with axes:
    :loc_techs, String["X2:pv", "X1:demand_heat", "X2:demand_heat", "X3:pv", "X2:demand_power", "X1:demand_power", "X1:pv", "X3:demand_heat", "X3:demand_power"]
And data, a 9-element DataArrays.DataArray{Float64,1}:
   0.0     
  -0.200838
 -70.4534  
   0.0     
 -76.9606  
  -0.405798
   0.0     
  -0.860322
 -18.7629  

d[Axis{:loc_techs}(["X1:pv", "X2:pv"])]

2-dimensional AxisArray{Float64,2,...} with axes:
    :time, DateTime[2005-07-01T00:00:00, 2005-07-01T01:00:00, 2005-07-01T02:00:00, 2005-07-01T03:00:00, 2005-07-01T04:00:00, 2005-07-01T05:00:00, 2005-07-01T06:00:00, 2005-07-01T07:00:00, 2005-07-01T08:00:00, 2005-07-01T09:00:00  …  2005-07-02T14:00:00, 2005-07-02T15:00:00, 2005-07-02T16:00:00, 2005-07-02T17:00:00, 2005-07-02T18:00:00, 2005-07-02T19:00:00, 2005-07-02T20:00:00, 2005-07-02T21:00:00, 2005-07-02T22:00:00, 2005-07-02T23:00:00]
    :loc_techs, String["X2:pv", "X1:pv"]
And data, a 48×2 Array{Float64,2}:
 0.0          0.0        
 0.0          0.0        
 0.0          0.0        
 0.0          0.0        
 0.0          0.0        
 0.00785714   0.00785714 
 0.0291429    0.0291429  
 0.0555714    0.0555714  
 0.0784286    0.0784286  
 0.096        0.096      
 0.108143     0.108143   
 0.107571     0.107571   
 0.09         0.09       
 ⋮

This may have some poor memory/speed based implications, but I'm too new to Julia to be able to pin them down!

ncgen does not preserve unlimited dimensions

ncgen does not preserve unlimited dimensions in the ncdf files it produces. And I could not find a function to set a dimension to unlimited. Is there one?

When I try to make an unlimited dimension with Inf, it is set to 0 instead

I am trying to make an unlimited dimension but Julia sets the dimension to zero.

ds = Dataset("test10.nc","c")

# Define the dimension "lon" and "lat" with the size unlimited and 110 resp.
defDim(ds,"lon",Inf)
defDim(ds,"lat",110)

# Define a global attribute
ds.attrib["title"] = "this is a test file"

# Define the variables temperature
v = defVar(ds,"temperature",Float32,("lon","lat"))

# Generate some example data
data = [Float32(i+j) for i = 1:100, j = 1:110]

# write a single column
v[:,1] = data[:,1]

# write a the complete data set
v[:,:] = data

# write attributes
v.attrib["units"] = "degree Celsius"
v.attrib["comments"] = "this is a string attribute with Unicode Ω ∈ ∑ ∫ f(x) dx"

But I get the following error message,
NCDatasets.NetCDFError(-1, "wrong size of variable 'temperature' (size (0, 110)) in file 'test10.nc' for an array of size (100, 110)")

Stacktrace:
[1] nc_put_var(::Int32, ::Int32, ::Array{Float32,2}) at C:\Users\Andre.julia\packages\NCDatasets\aYSmE\src\netcdf_c.jl:631
[2] setindex!(::NCDatasets.Variable{Float32,2}, ::Array{Float32,2}, ::Colon, ::Colon) at C:\Users\Andre.julia\packages\NCDatasets\aYSmE\src\NCDatasets.jl:988
[3] setindex!(::NCDatasets.CFVariable{Union{Missing, DateTime, AbstractCFDateTime, Number},2,NCDatasets.Variable{Float32,2},NCDatasets.Attributes}, ::Array{Float32,2}, ::Colon, ::Colon) at C:\Users\Andre.julia\packages\NCDatasets\aYSmE\src\NCDatasets.jl:1364
[4] top-level scope at In[7]:18

versioninfo()

Gives following output
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

Pkg.installed()["NCDatasets"]

Gives following output
v"0.9.0"

Pkg.test("NCDatasets")

Gives following output
Testing NCDatasets
Status C:\Users\AppData\Local\Temp\jl_4DE1.tmp\Manifest.toml
[9e28174c] BinDeps v0.8.10
[34da2185] Compat v2.1.0
[8f4d0f93] Conda v1.2.0
[a9693cdc] CondaBinDeps v0.1.0
[864edb3b] DataStructures v0.15.0
[682c06a0] JSON v0.20.0
[e1d29d7a] Missings v0.4.0
[85f8d34a] NCDatasets v0.9.0
[bac558e1] OrderedCollections v1.1.0
[30578b45] URIParser v0.4.0
[81def892] VersionParsing v1.1.3
[2a0f44e3] Base64 [@stdlib/Base64]
[ade2ca70] Dates [@stdlib/Dates]
[8bb1440f] DelimitedFiles [@stdlib/DelimitedFiles]
[8ba89e20] Distributed [@stdlib/Distributed]
[b77e0a4c] InteractiveUtils [@stdlib/InteractiveUtils]
[76f85450] LibGit2 [@stdlib/LibGit2]
[8f399da3] Libdl [@stdlib/Libdl]
[37e2e46d] LinearAlgebra [@stdlib/LinearAlgebra]
[56ddb016] Logging [@stdlib/Logging]
[d6f4376e] Markdown [@stdlib/Markdown]
[a63ad114] Mmap [@stdlib/Mmap]
[44cfe95a] Pkg [@stdlib/Pkg]
[de0858da] Printf [@stdlib/Printf]
[3fa0cd96] REPL [@stdlib/REPL]
[9a3f8284] Random [@stdlib/Random]
[ea8e919c] SHA [@stdlib/SHA]
[9e88b42a] Serialization [@stdlib/Serialization]
[1a1011a3] SharedArrays [@stdlib/SharedArrays]
[6462fe0b] Sockets [@stdlib/Sockets]
[2f01184e] SparseArrays [@stdlib/SparseArrays]
[10745b16] Statistics [@stdlib/Statistics]
[8dfed614] Test [@stdlib/Test]
[cf7118a7] UUIDs [@stdlib/UUIDs]
[4ec0a83e] Unicode [@stdlib/Unicode]
NetCDF library: C:\Users\Andre.julia\conda\3\Library\bin\netcdf.DLL
NetCDF version: 4.6.1 of Dec 20 2018 17:54:58 $
Test Summary: | Pass Total
NCDatasets | 561 561
Test Summary: | Pass Total
NetCDF4 groups | 8 8
Test Summary: | Pass Total
Variable-length arrays | 22 22
Test Summary: | Pass Total
Time and calendars | 874618 874618
Test Summary: | Pass Total
Multi-file datasets | 36 36
Test Summary: | Pass Total
Deferred datasets | 13 13
Testing NCDatasets tests passed

Integration with Unitful

I have:

julia> field
toa_sw_clr_c_mon (360 × 180 × 228)
  Datatype:    Float32
  Dimensions:  lon × lat × time
  Attributes:
   long_name            = Top of The Atmosphere Shortwave Flux, Clear-Sky (for cloud-free areas of region) conditions, Monthly Means
   standard_name        = TOA Shortwave Flux - Clear-Sky (for cloud-free areas of
region)
   CF_name              = none
   comment              = none
   units                = W m-2
   valid_min            =       0.00000
   valid_max            =       600.000
   _FillValue           = -999.0

and I do:

julia> Array(field) |> summary
"360×180×228 Array{Union{Missing, Float32},3}"

wouldn't it be cool if we at least tried to use Unitful.jl and instead of returning an array of FLoats, try to associate units with them, if the CFVariable have a units field?

Quick question on scale_factor and add_offset

Sorry, but I couldn't find this in the documentation.

Is there a way to set scale_factor and add_offset to variables? Do they go under attributes or are they keyword arguments?

Accessing variable in loaded dataset leads to kernel dying

Hi, I've jut come across NCDatasets as a much cleaner way of dealing with NetCDF files than the NetCDF package. However, on loading a file into Julia and accessing a variable, the whole kernel dies!

Current usage:

Using NCDatasets; dataset = Dataset("test_netcdf.nc")
dataset["varname1"]

Pretty simple. The netcdf is only 80kb in size, the variable I was trying to access has 32 elements over one dimension.

Hourly increments not being applied to DateTime

I have a datetime coordinate with the following information:

Dataset: time.nc
Group: /

Dimensions
   time = 48

Variables
  time   (48)
    Datatype:    Int64
    Dimensions:  time
    Attributes:
     units                = hours since 2005-07-01 00:00:00
     calendar             = proleptic_gregorian

  time_test   (48)
    Datatype:    Int64
    Dimensions:  time
    Attributes:
     units                = hours since 2005-07-01 00:00:00
     calendar             = proleptic_gregorian

Global attributes
  _NCProperties        = version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18

However, on processing it, the output gets the date right without changing the hour in any timestep:

dataset["time"]

48-element NCDatasets.CFVariable{Int64,Float64,1}:
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
 2005-07-01T00:00:00
etc...

versions
Julia Version 0.6.0
Commit 903644385b* (2017-06-19 13:05 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

NCDatasets 0.0.3+ (commit 3982be7)

MWE
https://www.dropbox.com/s/k9r3igvljv6i4o7/time.nc?dl=0

tests
All NCDatasets tests passed

Performance and array semantics

Calling functions which iterate over array elements (e.g. Statistics.mean) on netcdf datasets can be very slow. It would be useful to have some performance tips to e.g. first convert a dataset to an Array.

On that note, I noticed that simply calling Array(dataset) is also slow. I take it from the examples in the manual that the suggested way to convert is to call dataset[:]. However this has different behaviour from ordinary Julia multidimensional arrays, which return a vector:

julia> X = rand(3,3);

julia> X[:]
9-element Array{Float64,1}:
 0.1443316789104081
 0.8768272466421989
 0.01240381950170022
 0.6732249425627772
 0.4128628021781906
 0.2112718766221251
 0.9459472658879167
 0.8996044010964837
 0.47806340748050236

Add topics/keywords to Github page

Might be a good idea to add keywords to the main page. https://pkg.julialang.org uses those keywords to index the packages. As an example, NCDatasets does not show up when typing netcdf.

Ref: https://discourse.julialang.org/t/new-pkg-julialang-org/22622

Something is wrong with `mean`: takes too much time

Describe the bug

Calling mean on a NCDataset seems to take too much time. Much more than when applied to the direct array representation.

To Reproduce
Let's say I have:

julia> fld
toa_sw_all_mon (360 × 180 × 228)
  Datatype:    Float32
  Dimensions:  lon × lat × time
  Attributes:
   long_name            = Top of The Atmosphere Shortwave Flux, All-Sky conditions, Monthly Means     
   standard_name        = TOA Shortwave Flux - All-Sky
   CF_name              = toa_outgoing_shortwave_flux
   comment              = none
   units                = W m-2
   valid_min            =       0.00000
   valid_max            =       600.000
   _FillValue           = -999.0

julia> fldarray = Array(fld);

Then I do:

julia> mean(fldarray)
102.24223f0

which is instant. However,

julia> mean(fld)

takes way too long (I always have to terminate Julia process). The same story goes for using dim(fld; dims = 3)...

Expected behavior

mean should take almost as much time as in the Array version.

Is it possible to get memory mapping in NCDatasets?

Hi:

I am wondering about ways of improving performance when reading large, multifile datasets.
I declare I am not an expert on this matter, but...

Is it possible for NCDatasets to use mmap?
HDF5 does it like this:

Sorry if this is a dumb question!

Breaking changes ahead

When I started writing NCDatasets, DataArray was the recommended way to store missing values and also used by the packages DataFrames.jl.

There is now a better approach with the missing type [1] (which will be part of Julia 0.7 and is currently an external package for 0.6). DataFrames.jl now also uses the missing type.

If e.g. ds is a NetCDF dataset with the 2D Float32 variable var, then
ds["var"][:,:] returns currently a DataArrays.DataArray{Float32,2}. I propose to change this behaviour and the return Array{Union{Float32,Missings.Missing},2} instead.

We could provide additional simple utility function like nomissing(a) (which checks that the array contains no missing values and returns a "plain" array of type Array{Float32,2} for the example above) and nomissing(a,fv) (which replace all missing values by fv and returns a "plain" array).

It seems that when no missing values are present, DataFrames returns directly an "plain" array when a column is indexed. This behaviour could also be implemented but as far as I can see this would not be type stable.

Any suggestion how to deal with the braking change are welcome.

[1] https://docs.julialang.org/en/latest/base/base/#Base.Missing

Printing problems in Juno

Let's say I load a dataset:

ncdir = datadir("CERES", "CERES_EBAF_TOA.nc")
TOA = NCDataset(ncdir)
times = TOA["time"]

If I do the command times = TOA["time"] via Juno's inline evaluation, there is a problem. The following are printed in the REPL console (notice, not inline):

julia> time (231)
  Datatype:    Int32
  Dimensions:  time
  Attributes:
   long_name            = time
   units                = days since 2000-03-01 00:00:00
   delta_t              = 0000-00-01 00:00:00

while the inline printing gets:


231-element NCDatasets.CFVariable{Union{Missing, DateTime, AbstractCFDateTime, Number},1,NCDatasets.Variable{Int32,1},NCDatasets.Attributes}:
 2000-03-15T00:00:00
 2000-04-15T00:00:00
 2000-05-15T00:00:00
 2000-06-15T00:00:00
 2000-07-15T00:00:00
 2000-08-15T00:00:00
 2000-09-15T00:00:00
 2000-10-15T00:00:00
 2000-11-15T00:00:00
 2000-12-15T00:00:00

There has to be something wrong with passing the "correct" IO around, and at some point the stdout is passed instead... Alas, I've read the pretty printing source many times, but couldn't find the problem.

Environment
Latest Master, latest stable from Julia/Juno. Windows 10.

conflict of netcdf.dll

When using MATLAB the netcdf library is pointed to netcdf.dll in the MATLAB installation folder and NCDatasets reports

error compiling nc_open: could not load library "C:\Users\Jianghui\.julia\conda\3\Library\bin\netcdf.DLL"
The specified module could not be found.

How could I redirect the library?

Moving to an org / making a golden standard for NetCDF files in Julia

(continuing from an offtopic discussion in #47 (comment) )

Concerning moving NCDatasets to an organization: I prefer to keep it under my name, but I might reconsider should I have less time to work on it.

I've spent some time arguing why I think having packages in orgs instead of owners is a good thing. Here is my summary: having more contributors is only a small part of why I am suggesting this. Its more so about improving discoverability, improving trust by the community, inviting more collaborators (for some, having your name as an owner can be decisive in not contributing), having more sets of eyes trying to converge on the best design decisions, having more maintainers that can review and merge PRs as well as answer Issues, even fairness (because if this one person is the one that will absolutely take all decisions according to their one personal agenta, its not that fair for the rest of the contributors or the user community I'd say). These are some of the benefits among others. Obviously there are numerous downsides for you personally, e.g. sharing "power". Others have also argued why orgs can be "bad" here: https://discourse.julialang.org/t/newcomer-contributor-in-juliageo-and-co-help-me-get-started/32480 but I remain in favor of orgs.

Besides, such a massively central thing like "loading nc datasets" which I would imagine is used by so many people working in these fields, is just too important to be lacking those aforementioned features. I am a huge proponent of having that "one" package that does that "one" thing as good as possible. I will be for sure spending the next two years working with .nc files, and thus I want to spend time and effort making this golden standard possible, and even channeling some of the community into this effort. In my eyes, this seems to be easier to do with an org.

Ultimately it is obviously up to your personal agenta what happens with this repo, but I just hope that you can see that my suggestions come truly from a "improve the community experience" and not "I want to take power from you". I would believe that other people also share the same view as me. I will continue contributing here weekly of course, but I am also convinced that such a central part of the ecosystem should part of an organization. Of course, you don't have to decide immediately. As I contribute more and more here, the benefits of collaboration might skew your opinion towards the org.

The first reason for opening this discussion is simply laying down the facts from my perspective. The second reason for opening this issue, is that I'd like @Alexander-Barth to please give a bit more transparency about what should belong in this repo or not. This request is motivated by the following comments:

Concerning the v(:, :, 1:5): I am wondering if you can have this alternative syntax as an extension package, but it is a clever idea. (from)

(... discussion regarding sum on a dataset) but I am not sure if computing should be part of NCDatasets. (from)

The aforementioned comments hint a lack of transparency on what would be accepted as a PR or not. Working on a feature that when submitted as a PR will be rejected as "unfitting" can be very demotivating... Can you please give some hints on what you think should be here and what not?

My journey in netcdf files started by porting Python code to Julia. Python code used xarray for their .nc stuff, and unfortunately, truth be told, xarray has very long list of features that do not exist in Julia (I was even considering of using xarray for all my .nc related work actually). As I decided to go with the native approach and implement these features instead, doing it here is the only thing natural for me.
But that's only if they are accepted obviously :P

Set attributes on dimensions?

Is it possible to set attributes on dimensions when creating a netCDF file? I haven't been able to spot it in the docs, but I may have missed it.

If this isn't possible, it should be, especially for readers to understand geographical and time dimensions correctly.

Error tagging new release

The tag name "0.0.11" is not of the appropriate SemVer form (vX.Y.Z).
cc: @Alexander-Barth

cannot install after update because incompatibility with other packages?

After updating BinDeps to v1.0.0 (which is required by WinRPM, PackageCompiler and MKL) I cannot install NCDatasets and DIVAnd anymore. It appears it is because CondaBindeps is not compatible with latest BinDeps:

ERROR: Unsatisfiable requirements detected for package CondaBinDeps [a9693cdc]:
 CondaBinDeps [a9693cdc] log:
 ├─possible versions are: 0.1.0 or uninstalled
 ├─restricted by compatibility requirements with NCDatasets [85f8d34a] to versions: 0.1.0
 │ └─NCDatasets [85f8d34a] log:
 │   ├─possible versions are: [0.3.0-0.3.2, 0.4.0, 0.5.0-0.5.1, 0.6.0, 0.7.0, 0.8.0, 0.9.0-0.9.5] or uninstalled
 │   ├─restricted to versions * by an explicit requirement, leaving only versions [0.3.0-0.3.2, 0.4.0, 0.5.0-0.5.1, 0.6.0, 0.7.0, 0.8.0, 0.9.0-0.9.5]
 │   └─restricted by compatibility requirements with BinDeps [9e28174c] to versions: 0.9.2-0.9.4 or uninstalled, leaving only versions: 0.9.2-0.9.4
 │     └─BinDeps [9e28174c] log:
 │       ├─possible versions are: [0.7.0, 0.8.9-0.8.10, 1.0.0] or uninstalled
 │       └─restricted to versions 1.0.0 by an explicit requirement, leaving only versions 1.0.0
 └─restricted by compatibility requirements with BinDeps [9e28174c] to versions: uninstalled — no versions left
   └─BinDeps [9e28174c] log: see above

I checked that CondaBinDeps had not been updated since 2018. I wonder if it is still maintained.

Feature : monthday query for non-standard calendar

Hello again!

There are two other functions that I thought might be useful: dayofyear and daysinyear. I'm using it to do some statistical post-processing for climate models against observational datasets and the transfer functions are defined on julian day of the year.

For example, DateTimeNoLeap, the range should be comprised between 1:365 for a leap year and not 1:366.

Here's julia native interface.
https://github.com/JuliaLang/julia/blob/cda41aebb35eb12a806c586ebad57a20f1cab648/stdlib/Dates/src/query.jl#L106
https://github.com/JuliaLang/julia/blob/cda41aebb35eb12a806c586ebad57a20f1cab648/stdlib/Dates/src/query.jl#L309
https://github.com/JuliaLang/julia/blob/cda41aebb35eb12a806c586ebad57a20f1cab648/stdlib/Dates/src/query.jl#L311

There are other query functions listed here, but I think that with dayofyear and dayinyear (and the others that you implemented!), you're covering most of the useful use-case.
https://docs.julialang.org/en/v1/stdlib/Dates/index.html#Query-Functions-1

Sorry for all my requests! :)

NCDatasets should fail when trying to access a nonexisting index.

Hi:

When trying to read beyond the possible indices of a multifile dataset, NCDatasets doesn't fail, instead returns the array with some random data.

file2d = somelistoffiles
ds2d                = Dataset(file2d)
for (varname,var) in ds2d
    @show (varname,size(var))
end

returns: (varname, size(var)) = ("Prec", (512, 512, 2400))

If now I do

Prec = ds2d["Prec"].var[:,:,2500]

I get an array 512×512 Array{Float32,2} with some random information.

I would expect this to fail. Is that the expected behavior?

Data typed Union{Missing, Dates.DateTime, AbstractCFDateTime, Number} when calling discrete indices

I have a NetCDF file called "timeSlab_2d.nc". When trying to obtain one index or a range, the element type of the returned array is Union{missing, Float32}. However, when trying to get a 'discrete' set of indices, the element type of the returned array is Union{Missing, Dates.DateTime, AbstractCFDateTime, Number}.

First of all: excuse me for not providing a self-contained example (by not providing the file). How can I provide a self-contained example in the future?

Example:

julia> ds = Dataset("timeSlab_2d.nc");
julia> ds["x"]
x (128)
  Datatype:    Float32
  Dimensions:  x
  Attributes:
   units                = m

julia> ds["x"][1:5]
5-element Array{Union{Missing, Float32},1}:
      0.0f0
  40000.0f0
  80000.0f0
 120000.0f0
 160000.0f0

julia> ds["x"][[1,3,5]]
3-element Array{Union{Missing, Dates.DateTime, AbstractCFDateTime, Number},1}:
      0.0f0
  80000.0f0
 160000.0f0

Expected behavior

I would expect the returned array in all the cases to be of type:

Array{Union{missing,Float32},1}

Environment

-Julia Version 1.3.0-rc4.1
Commit 8c4656b97a (2019-10-15 14:08 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.6.0)
CPU: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = /Applications/Emacs.app/Contents/MacOS/Emacs "$@"

[85f8d34a] NCDatasets v0.9.5

yearmonthday function

Hello again!

Similar to the daysinmonth issue, I think another useful function would be the yearmonthday.

To put in context, I'm developing a function that does some daily calculation from sub-daily temporal data (daily means, sums, etc..). Hence, to construct the output array (daily data), I need to know how many unique days is contained in the input array (sub-hourly data).

For example:

using Dates
d = DateTime(2000, 01, 01, 0, 0, 0):Hour(1):DateTime(2000,12, 31)
arraylen = unique(Dates.yearmonthday.(Dates.days.(d)))

dataout = Array{Float64}(undef, N, M, arraylen)

But the function is not defined for DateTimeNoLeap, etc...

For reference, the original Julia Dates yearmonthday function is here.
https://github.com/JuliaLang/julia/blob/c670f1acdba1971b8545f1c3f3b0cfe55ee0d3f5/stdlib/LibGit2/src/signature.jl#L40

Accessing a dataset with a range takes forever...?

I have

fld = ebaf41_toa["toa_sw_all_mon"]

julia> toa_sw_all_mon (360 × 180 × 228)
  Datatype:    Float32
  Dimensions:  lon × lat × time
  Attributes:
   long_name            = Top of The Atmosphere Shortwave Flux, All-Sky conditions, Monthly Means     
   standard_name        = TOA Shortwave Flux - All-Sky
   CF_name              = toa_outgoing_shortwave_flux
   comment              = none
   units                = W m-2
   valid_min            =       0.00000
   valid_max            =       600.000
   _FillValue           = -999.0

I do

fld[:, :, :]

which instantly returns the data as Array. But then I do:

fld[:, :, 1:228]

and this takes like five minutes to return...?

Convenience selection function for sub-parts of NCDataset

Hi there, I have a feature request which could be helpful, albeit it is a small convenience syntax.

Let's say I have loaded an NCDataset. In Python this is done with xarray, but here we have something like

ebaf41_toa = NCDataset(datadir("CERES", "CERES_EBAF-TOA_Ed4.1_Subset_200003-201902.nc"))

Now, let's say we have a field of this dataset, fld = ebaf41_toa["toa_sw_all_mon"]:

toa_sw_all_mon (360 × 180 × 228)
  Datatype:    Float32
  Dimensions:  lon × lat × time
  Attributes:
   long_name            = Top of The Atmosphere Shortwave Flux, All-Sky conditions, Monthly Means     
   standard_name        = TOA Shortwave Flux - All-Sky
   CF_name              = toa_outgoing_shortwave_flux
   comment              = none
   units                = W m-2
   valid_min            =       0.00000
   valid_max            =       600.000
   _FillValue           = -999.0

In Python, you would do something like

ebaf41_toa = xr.open_dataset(path)
fld = ebaf41_toa.toa_sw_all_mon

This xarray offers a convenience syntax like

fld.sel(time=slice('2000-03-01','2019-03-01'))

which means that you can select sub-parts of the field by specifying which ranges of the dependent variables to keep.

This could be implemented here as well, but one has to somehow map the given keywords to "which" dependent variables they represent. I am happy to make this contribution if you are willing to guide me.

Cannot read time variable: "Unsupported calendar: Gregorian"

Describe the bug

When I try to read the time variable, I get:

Unsupported calendar: Gregorian

Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] timetype(::String) at /home/ctroupin/.julia/packages/NCDatasets/P5zq1/src/CFTime.jl:564
 [3] #timedecode#8(::Bool, ::typeof(NCDatasets.CFTime.timedecode), ::Array{Float64,1}, ::String, ::String) at /home/ctroupin/.julia/packages/NCDatasets/P5zq1/src/CFTime.jl:617
 [4] timedecode(::Array{Float64,1}, ::String, ::String) at /home/ctroupin/.julia/packages/NCDatasets/P5zq1/src/CFTime.jl:617
 [5] getindex(::NCDatasets.CFVariable{Union{Missing, Dates.DateTime, AbstractCFDateTime, Number},1,NCDatasets.Variable{Float64,1},NCDatasets.Attributes}, ::Int64) at /home/ctroupin/.julia/packages/NCDatasets/P5zq1/src/NCDatasets.jl:1261
 [6] (::getfield(Main, Symbol("##19#20")))(::Dataset) at ./In[16]:3
 [7] #Dataset#20(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Type{Dataset}, ::getfield(Main, Symbol("##19#20")), ::String) at /home/ctroupin/.julia/packages/NCDatasets/P5zq1/src/NCDatasets.jl:529
 [8] Dataset(::Function, ::String) at /home/ctroupin/.julia/packages/NCDatasets/P5zq1/src/NCDatasets.jl:527
 [9] top-level scope at In[16]:1

To Reproduce

ds =Dataset("test_time.nc","c")
# Dimensions

ds.dim["row"] = 10

nctime = defVar(ds,"time", Float64, ("row",))
nctime.attrib["_CoordinateAxisType"] = "Time"
nctime.attrib["calendar"] = "Gregorian"
nctime.attrib["long_name"] = "Valid Time GMT"
nctime.attrib["standard_name"] = "time"
nctime.attrib["time_origin"] = "01-JAN-1970 00:00:00"
nctime.attrib["units"] = "seconds since 1970-01-01T00:00:00Z"

# Define variables

nctime[:] = [1.54630441e+09, 1.54630808e+09, 1.54630808e+09, 1.54631161e+09,
       1.54631161e+09, 1.54631524e+09, 1.54631524e+09, 1.54632243e+09,
       1.54632243e+09, 1.54632963e+09]

close(ds)

Expected behavior

I can read the netCDF with Python ;)

Environment

operating system: Ubuntu 16.04
Julia version: 1.2.0
NCDatasets v0.9.2

DateTime support

Now some "real" bug in my mind:

cannot write time values as floats when time attributes (units = "seconds since some_refdate" ) are set - the output is automatically converted to DateTime object and I get an error if I try to set variable "time" value in output:
"ERROR: LoadError: MethodError: no method matching ndims(::DateTime)"

It is not possible to set the value neither with DateTime object nor with float.

If I do not set the time attributes in outputfile, then I can at least write float-typed values in the outputfile.

I have attached one test file and script.
test_002.zip

Can one obtain the dimensions directly from a dataset?

at the moment we have:

julia> typeof(EBAF)
Dataset

julia> typeof(v)
NCDatasets.CFVariable{Union{Missing, Dates.DateTime, AbstractCFDateTime, Number},3,NCDatasets.Variable{Float32,3},NCDatasets.Attributes}

julia> dimnames(v)
("lon", "lat", "time")

julia> dimnames(EBAF)
ERROR: MethodError: no method matching dimnames(::Dataset)

which should otherwise be possible, since even the pretty printing of Dataset displays the dimensions.

Multifile datasets

Hello!

Is there a plan for loading a multifile dataset? For instance, climate models outputs are often splitted over multiple files (temporally splitted). Loading the dataset where Dataset takes an array of string might be useful. I know that xarray does it with the open_mfdataset() method.

I'm also thinking that using Dagger.jl for asynchronous loading would be a killer feature for such multifile datasets.

Cheers!

API changes 0.9 -> 0.10

It would be nice to have a 1.0 release of NCDatasets. There might be some aspect of the API that need to be deprecated:

use NCDataset and deprecate Dataset (but we might keep Dataset as an alias around) for the reasons explained here #44 by @Datseris
require attributes that have an effect on the return type be declared (such as scale_factor) in defVar:

using DataStructures
ncdata = defVar(ds,"data",Int8, ("time",), 
     attrib = OrderedDict("scale_factor" => 10., "add_offset" => -1.))

instead of

ncdata = defVar(ds,"data",Int8, ("time",))
ncdata.attrib["scale_factor"]  = 10.
ncdata.attrib["add_offset"]  = -1.

Such attributes are scale_factor, add_offset and _FillValue, unit and calendar (for time variables).
Otherwise it is not possible to know the element-type of ncdata when it is created (as it can change). I belief that this is necessary address #40. You can also use Dict in the example above, if the order of attribute is not important.

Do not use Union{Float64,Missing} (or similar type) if a "_FillValue" attribute is not defined, as element-type. We would just use Float64 (or similarly Float32, Int8, ...).
The constants: NC_FILL_BYTE, NC_FILL_CHAR, NC_FILL_SHORT, NC_FILL_INT, NC_FILL_FLOAT, NC_FILL_DOUBLE, NC_FILL_UBYTE, NC_FILL_USHORT, NC_FILL_UINT, NC_FILL_INT64, NC_FILL_UINT64, NC_FILL_STRING are have to be replace by e.g. fillvalue(Int8).

Let me know, if you think that additional breaking changes are necessary before 1.0 by replying to this issue or if you disagree with what I proposed.

In any case, I think we will need a version 0.10 where we test all these changes.

Abstract date handling to another package?

Your DateTime types and handling seem to be generally useful for attaching to any spatial datasets with a temporal dimension.

Abstracting them to another package would mean those types could be used to specify date time standards in any geo array type, without needing the binary and conda deps associated with this package.

I'll do the work if it makes sense to you to do.

alexander-barth / ncdatasets.jl Goto Github PK

ncdatasets.jl's Introduction

NCDatasets

Installation

Manual

Create a netCDF file

Explore the content of a netCDF file

Load a netCDF file

Edit an existing netCDF file

Benchmark

Filing an issue

Alternative

Credits

ncdatasets.jl's People

Contributors

Stargazers

Watchers

Forkers

ncdatasets.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org