Giter VIP home page Giter VIP logo

purehdf's Introduction

PureHDF

GitHub Actions NuGet

A pure C# library without native dependencies that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.

The minimum supported target framework is .NET Standard 2.0 which includes

  • .NET Framework 4.6.1+
  • .NET Core (all versions)
  • .NET 5+

This library runs on all platforms (ARM, x86, x64) and operating systems (Linux, Windows, MacOS, Raspbian, etc) that are supported by the .NET ecosystem without special configuration.

The implemention follows the HDF5 File Format Specification (HDF5 1.10).

Please read the docs for samples and API documentation.

Version 2 changes

To keep the code base clean, version 2 of PureHDF supports active .NET versions only, which are .NET 6 and .NET 8 as of now (June 2024).

Version 1 of PureHDF supports all .NET versions starting with .NET 4.7.2 and continues to receive bug fixes. Features will be backported upon request if feasible.

Installation

dotnet add package PureHDF

Quick Start

Reading

// root group
var file = H5File.OpenRead("path/to/file.h5");

// sub group
var group = file.Group("path/to/group");

// attribute
var attribute = group.Attribute("my-attribute");
var attributeData = attribute.Read<int>();

// dataset
var dataset = group.Dataset("my-dataset");
var datasetData = dataset.Read<double>();

See the docs to learn more about data types, multidimensional arrays, chunks, compression, slicing and more.

Writing

The first step is to create a new H5File instance:

var file = new H5File();

A H5File derives from the H5Group type because it represents the root group. H5Group implements the IDictionary interface, where the keys represent the links in an HDF5 file and the value determines the type of the link: either it is another H5Group or a H5Dataset.

You can create an empty group like this:

var group = new H5Group();

If the group should have some datasets, just add them using the dictionary collection initializer - just like with a normal dictionary:

var group = new H5Group()
{
    ["numerical-dataset"] = new double[] { 2.0, 3.1, 4.2 },
    ["string-dataset"] = new string[] { "One", "Two", "Three" }
}

Datasets and attributes can both be created either by instantiating their specific class (H5Dataset, H5Attribute) or by just providing some kind of data. This data can be nearly anything: arrays, scalars, numerical values, strings, anonymous types, enums, complex objects, structs, bool values, etc. However, whenever you want to provide more details like the dimensionality of the attribute or dataset, the chunk layout or the filters to be applied to a dataset, you need to instantiate the appropriate class.

But first, let's see how to add attributes. Attributes cannot be added directly using the dictionary collection initializer because that is only for datasets. However, every H5Group has an Attribute property which accepts our attributes:

var group = new H5Group()
{
    Attributes = new()
    {
        ["numerical-attribute"] = new double[] { 2.0, 3.1, 4.2 },
        ["string-attribute"] = new string[] { "One", "Two", "Three" }
    }
}

The full example with the root group, a subgroup, two datasets and two attributes looks like this:

var file = new H5File()
{
    ["my-group"] = new H5Group()
    {
        ["numerical-dataset"] = new double[] { 2.0, 3.1, 4.2 },
        ["string-dataset"] = new string[] { "One", "Two", "Three" },
        Attributes = new()
        {
            ["numerical-attribute"] = new double[] { 2.0, 3.1, 4.2 },
            ["string-attribute"] = new string[] { "One", "Two", "Three" }
        }
    }
};

The last step is to write the defined file to the drive:

file.Write("path/to/file.h5");

See the docs to learn more about data types, multidimensional arrays, chunks, compression, slicing and more.

Development

The tests of PureHDF are executed against .NET 6 and .NET 7 so these two runtimes are required. Please note that due to an currently unknown reason the writing tests cannot be run in parallel to other tests because some unrelated temp files are in use although they should not be and thus cannot be accessed by the unit tests.

If you are using Visual Studio Code as your IDE, you can simply execute one of the predefined test tasks by selecting Run Tasks from the global menu (Ctrl+Shift+P). The following test tasks are predefined:

  • tests: common
  • tests: writing
  • tests: filters
  • tests: HSDS

The HSDS tests require a python installation to be present on the system with the venv package available.

Comparison Table

Overwhelmed by the number of different HDF 5 libraries? Here is a comparison table:

Note: The following table considers only projects listed on Nuget.org

Name Arch Platform Kind Mode Version License Maintainer Comment
v1.10
PureHDF all all managed rw 1.10.* MIT Apollo3zehn
HDF5-CSharp x86,x64 Win,Lin,Mac HL rw 1.10.6 MIT LiorBanai
SciSharp.Keras.HDF5 x86,x64 Win,Lin,Mac HL rw 1.10.5 MIT SciSharp fork of HDF-CSharp
ILNumerics.IO.HDF5 x64 Win,Lin HL rw ? proprietary IL_Numerics_GmbH probably 1.10
LiteHDF x86,x64 Win,Lin,Mac HL ro 1.10.5 MIT silkfire
hdflib x86,x64 Windows HL wo 1.10.6 MIT bdebree
Mbc.Hdf5Utils x86,x64 Win,Lin,Mac HL rw 1.10.6 Apache-2.0 bqstony
HDF.PInvoke x86,x64 Windows bindings rw 1.8,1.10.6 HDF5 hdf,gheber
HDF.PInvoke.1.10 x86,x64 Win,Lin,Mac bindings rw 1.10.6 HDF5 hdf,Apollo3zehn
HDF.PInvoke.NETStandard x86,x64 Win,Lin,Mac bindings rw 1.10.5 HDF5 surban
v1.8
HDF5DotNet.x64 x64 Windows HL rw 1.8 HDF5 thieum
HDF5DotNet.x86 x86 Windows HL rw 1.8 HDF5 thieum
sharpHDF x64 Windows HL rw 1.8 MIT bengecko
HDF.PInvoke x86,x64 Windows bindings rw 1.8,1.10.6 HDF5 hdf,gheber
hdf5-v120-complete x86,x64 Windows native rw 1.8 HDF5 daniel.gracia
hdf5-v120 x86,x64 Windows native rw 1.8 HDF5 keen

Abbreviations:

Term .NET API Native dependencies
managed high-level none
HL high-level C-library
bindings low-level C-library
native none C-library

purehdf's People

Contributors

apollo3zehn avatar blackclaws avatar marklam avatar

Stargazers

Mathews Bryan avatar Jenkin Lee avatar Edu avatar Tom Svilans avatar  avatar Johann Dirry avatar Aleksei Goncharov avatar Oliver avatar  avatar SODA Rikio avatar mixsoda avatar  avatar  avatar  avatar Eric Cestero avatar  avatar Zeng, Wen-Feng avatar Frank Niemeyer avatar Vetle Brænd avatar  avatar  avatar  avatar ToGo avatar Dana Robinson avatar Chuong Ho avatar  avatar Jens Theisen avatar  avatar Gerd Heber avatar x avatar Taylor Kendall avatar  avatar  avatar Ryan Higgins avatar MP avatar Vladimir Shchur avatar shuxin avatar Soggy avatar artes14 avatar  avatar 吴宏伟 avatar Cristiano Oliveira avatar  avatar James Mudd avatar John Riggles avatar  avatar

Watchers

 avatar Exclusive avatar

purehdf's Issues

Add IQuerable interface to build hyperslabs?

Edit (2023-01-03)

An experimental IQueryable support has been implemented as dataset.AsQueryable(). Stream support could be implemented similar in the form of dataset.AsStream();.

However, this is still an option:

dataset.Read().Execute(),
dataset.Read().Skip(1).Take(2).Execute()
dataset.Read().AsStream()

The advanage is that dataset.Read.Execute() is also a query, so there is only query and stream. But it is not quite clean to create a IQueryable first to finally get a stream.

Stream and query make mostly sense for 1-dimensional data. Stream cannot be implemented on multidimensional data since the data are not written linerarly into the memory.

Original Issue

LINQ to HDF?

// this could build a netCDF hyperslab (start, stop, stride)
dataset
   .Skip([]) // = start
   .Take(ulong[]) // = stop - start
   .Where((x, n) => n % nth == 0) // stride
   .Read<int>();
// this could build an HDF5 hyperslab (start, stride, count, block)
dataset
   .Skip([]) // = start
   .Where((x, n) => n % nth == 0) // stride
   .Repeat(y) // count (https://fuqua.io/Rx.NET/ix-docs/html/M_System_Linq_QueryableEx_Repeat__1_3.htm)
   .Take(ulong[]) // block
   .Read<int>();

https://jacopretorius.net/2010/01/implementing-a-custom-linq-provider.html

LINQ Part 3: An Introduction to IQueryable - CodeProject
https://www.codeproject.com/Articles/1240553/LINQ-Part-An-Introduction-to-IQueryable

Returning IEnumerable vs. IQueryable - Stack Overflow
https://stackoverflow.com/questions/2876616/returning-ienumerablet-vs-iqueryablet

Unify reading and writing API

  • caching
  • remove endianness support?
  • restore commented out code parts
  • restore fill value
  • repair all tests
  • At least throw exception that async is not supported in native except datasets
  • repair multi threading (see benchmark)

Hyperslab visualizer

Like this, but extended (actual_rs = "actual_resized"):

C#

var aa = actual.ToArray();
var bb = expected.ToArray();

var sb1 = new StringBuilder();
var sb2 = new StringBuilder();

for (int i = 0; i < expected.Length; i++)
{
    sb1.Append($"{aa[i]},");
    sb2.Append($"{bb[i]},");
}

var sb1f = sb1.ToString();
var sb2f = sb2.ToString();

Matlab

close all

% reshape into C-Order
intermediate_rs = permute(reshape(intermediate, 4, 25, 25), [3 2 1]);
actual_rs       = reshape(actual, 25, 75).';
expected_rs     = reshape(expected, 25, 75).';

sourceDim1      = size(intermediate_rs, 1);
sourceDim2      = size(intermediate_rs, 2);
sourceDim3      = size(intermediate_rs, 3);
targetDim1      = size(actual_rs, 1);
targetDim2      = size(actual_rs, 2);

% source selection
figure
title('source selection (rank = 3)')

for i = 1 : sourceDim1
    for j = 1 : sourceDim2
        for k = 1 : sourceDim3          
            text(...
                (j - 1) / sourceDim2, ...
                1 - ((i - 1) / sourceDim1), ...
                    -(k - 1) / sourceDim3, ...
                num2str(intermediate_rs(i, j, k)), ...
                'FontSize', 8 ...
             )
        end
    end
end

% target selection (actual)
figure
title('target selection (actual, rank = 2)')

for i = 1 : targetDim1
    for j = 1 : targetDim2
        text(...
                 (j - 1) / targetDim2, ...
            1 - ((i - 1) / targetDim1),...
            num2str(actual_rs(i, j)), ...
            'FontSize', 8 ...
        )
    end
end

% target selection (expected)
figure
title('target selection (expected, rank = 2)')

for i = 1 : targetDim1
    for j = 1 : targetDim2
        if (actual_rs(i, j) ~= expected_rs(i, j))
            color = 'r';
        else
            color = 'k';
        end
        
        text(...
                 (j - 1) / targetDim2, ...
            1 - ((i - 1) / targetDim1), ...
            num2str(expected_rs(i, j)), ...
            'FontSize', 8, 'Color', color)
    end
end

Chunk cache problem

When chunk > chunk cache max value, then the chunk does not become part of the cache ... and so it is not written to file

Error in reading HDF5 file

I am trying to read a file with but I am getting the following error:

1. Solution exception:H5F.open
File "recorder.h5"  failed to open with status -1

I have been able to open the file with HDFView so I am pretty sure the file is not corrupted.
The file is created from an FEA software that it is using "HDF5 library version: 1.10.1".

I have attached the file in case someone wants to try to help

https://drive.google.com/file/d/1SAKkZf0VGHRfbdPKabyiEPzpEXie4VzC/view?usp=sharing

Code that I have used

import HDF5DotNet

from HDF5DotNet import *

import System
from System import Array, Double, Int64

print('\nInitializing HDF5 library\n')
status = H5.Open()
print('HDF5 ', H5.Version.Major, '.', H5.Version.Minor, '.', H5.Version.Release)

h5file = H5F.open('recorder.hdf5', H5F.OpenMode.ACC_RDONLY)
H5F.close(h5file)
print '\nShutting down HDF5 library\n'
status = H5.Close()

Can you help?

hyperslab: merge subset definitions or pass a "list" of offsets?

Hi all,

does anyone knows how to merge different hyperslab definitions? I am working with 2D data (1028 channels x 3 10e6 points sampled along time) which are chunked (n=200). I can read 1 channel at a time and thanks to effort from Apollo3zehn and the use of threads, we can read this within a reasonable time (4 s instead of 30-40 s). However, I'd like to read a user-defined subset of these channels. Is it possible with the current implementation of hyperslab within HDF5.NET?

Right now, the only way I see how to use hyperslab is to define blocks of contiguous channels (changing "count" and/or "block"), or of channels which are regularly separated (changing "stride"). As a results, it means to read 1 channel at a time. Is it possible to define a "list" of channels?

Sorry if I missed something... And many thanks for any help (and your patience).
Fred

Here is a figure of what I would like to achieve. Right now, I see how to read one row (yellow). I'd like to be able to read at once all green rows as well.
Untitled-1

Replace SpanExtensions with this approach?

// https://docs.microsoft.com/en-us/dotnet/api/system.array?view=netcore-3.1
// max array length is 0X7FEFFFFF = int.MaxValue - 1024^2 bytes
// max multi dim array length seems to be 0X7FEFFFFF x 2, but no confirmation found
private unsafe T ReadCompactMultiDim<T>()
{
    // vllt. einfach eine weitere Read<T> Methode (z.B. ReadMultiDim), 
    // die keine generic constraint hat (leider), aber T zuerst auf IsArray
    // geprüft wird
    // beide Methoden definieren dann ein Lambda, um den Buffer entsprechender
    // Größe zu erzeugen. Dieser Buffer wird dann gefüllt und kann von der 
    // jeweiligen Methode mit dem korrekten Typ zurückgegeben werden
    //
    // oder `T[,] = Read2D<T>()`, `T[,,] = Read3D<T>()`, etc ..., dann wäre generic constraint wieder möglich
    // oder: use implicit cast operator for multi dim arrays? http://dontcodetired.com/blog/post/Writing-Implicit-and-Explicit-C-Conversion-Operators

    //var a = ReadCompactMultiDim<T[,,]>();
    var type = typeof(T);

    var lengths = new int[] { 100, 200, 10 };
    var size = lengths.Aggregate(1L, (x, y) => x * y);
    object[] args = lengths.Cast<object>().ToArray();

    var buffer = (T)Activator.CreateInstance(type, args);

    var handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
    try
    {
        var span = new Span<byte>(handle.AddrOfPinnedObject().ToPointer(), (int)size);
        span.Fill(0x25);
        return buffer;
    }
    finally
    {
        handle.Free();
    }
}

hyperslab and threads: what could go wrong?

Hi all,

I am trying to read electrophysiology data from an H5file, whereby data are stored as 200 (data along time) x 1028 (channels) ushort and compressed. There is an enormous number of these chunks (about 3 000 000) (chunk size is certainly not optimal, but this what I get) and reading one channel of such data takes ages (aka 28 - 30s from a regular disk).

While I can read data which are stored as chunks and compressed using a direct approach or by reading each chunk in turn (it takes then 50 s), I thought that I could do multithreading, considering that by doing so, I could gain on the time necessary to decompress the data, knowing that the computer I work with has 12 cores.

However, when doing so, I get errors of groups not found or such, after a variable number of loops (typically 5 to 20). Any guess why this might occur?

Thank you of any help or cue,
Fred

Here is the code that fails (it works however if I replace the "Parellel.For" loop with a regular "for" loop):

public ushort[] ReadAll_OneElectrodeAsIntParallel(ElectrodeProperties electrodeProperties)
{
H5Group group = Root.Group("/");
H5Dataset dataset = group.Dataset("sig");
var nbdatapoints = dataset.Space.Dimensions[1]; // any size*
const ulong chunkSizePerChannel = 200;
var result = new ushort[nbdatapoints];
var nchunks = (long) (nbdatapoints / chunkSizePerChannel) ;

        int ndimensions = dataset.Space.Rank;
        if (ndimensions != 2)
            return null;

        Parallel.For (0, nchunks, i =>
        {
            var istart = (ulong) i * chunkSizePerChannel;
            var iend = istart + chunkSizePerChannel - 1;
            if (iend > nbdatapoints)
                iend = nbdatapoints - 1;
            var chunkresult = Read_OneElectrodeDataAsInt(group, dataset, electrodeProperties.Channel, istart, iend);
            Array.Copy(chunkresult, 0, result, (int) istart, (int) (iend - istart + 1));
        }) ;

        return result;
    }

Here is the code that works:
public ushort[] ReadAll_OneElectrodeAsInt(ElectrodeProperties electrodeProperties)
{
H5Group group = Root.Group("/");
H5Dataset dataset = group.Dataset("sig");
int ndimensions = dataset.Space.Rank;
if (ndimensions != 2)
return null;
var nbdatapoints = dataset.Space.Dimensions[1]; // any size*
return Read_OneElectrodeDataAsInt(group, dataset, electrodeProperties.Channel, 0, nbdatapoints -1);
}

Here is the function called by both routines:
public ushort[] Read_OneElectrodeDataAsInt(H5Group group, H5Dataset dataset, int channel, ulong startsAt, ulong endsAt)
{
var nbPointsRequested = endsAt - startsAt + 1;

        //Trace.WriteLine($"startsAt: {startsAt} endsAt: {endsAt} nbPointsRequested={nbPointsRequested}");
        
        var datasetSelection = new HyperslabSelection(
            rank: 2,
            starts: new[] { (ulong)channel, startsAt },         // start at row ElectrodeNumber, column 0
            strides: new ulong[] { 1, 1 },                      // don't skip anything
            counts: new ulong[] { 1, nbPointsRequested },       // read 1 row, ndatapoints columns
            blocks: new ulong[] { 1, 1 }                        // blocks are single elements
        );

        var memorySelection = new HyperslabSelection(
            rank: 1,
            starts: new ulong[] { 0 },
            strides: new ulong[] { 1 },
            counts: new[] { nbPointsRequested },
            blocks: new ulong[] { 1 }
        );

        var memoryDims = new[] { nbPointsRequested };
        var result = dataset
            .Read<ushort>(
                fileSelection: datasetSelection,
                memorySelection: memorySelection,
                memoryDims: memoryDims
            );

        return result;
    }

File Path only can be Const - Bindigns

Discussed in #40

Originally posted by FranciscoG001 August 29, 2023
Hello again, just want to know if its possible to change the type of filePath on: [H5SourceGenerator(filePath: HDF5Read.FILE_PATH)] internal partial class MyGeneratedH5Bindings { }; , because only can be const, or if its possible to pass another path to the .h5 file, because I also repair that just accept the hardcode path like "C:\user\..." and I like to use the path from a local folder on my project without the hardcode path.

The length of the limits parameter must match this hyperslab's rank.”

System.RankException
HResult=0x80131517
Message=The length of the limits parameter must match this hyperslab's rank.
Source=HDF5.NET
StackTrace:
在 HDF5.NET.HyperslabSelection.d__18.MoveNext()
在 HDF5.NET.SelectionUtils.d__0.MoveNext()
在 HDF5.NET.SelectionUtils.d__31.MoveNext() 在 HDF5.NET.H5Dataset.<ReadAsync>d__502.MoveNext()
在 HDF5.NET.H5Dataset.d__23.MoveNext()
在 HDF5.ConsoleTest.Program.d__14.MoveNext() 在 C:\Users\jiede\source\repos\SQLiteLib\src\SQLiteLib\Tests\HDF5.ConsoleTest\Program.cs 中: 第 412 行

此异常最初是在此调用堆栈中引发的:
[外部代码]
HDF5.ConsoleTest.Program.QueryData() (位于 Program.cs 中)

` using var h5file = H5File.OpenRead(file);
var h5group = h5file.Group(tableName);
var h5dataset = h5group.Dataset($"PARA_{j}_OBJECT");
var datasetSelection = new HyperslabSelection(
rank: 3,
starts: new ulong[] { 20, 33, 0 },
strides: new ulong[] { 1, 1, 1 },
counts: new ulong[] { 1, 1, 1 },
blocks: new ulong[] { 1, 1, 1 }
);

                var dataValues = await h5dataset.ReadStringAsync(datasetSelection);`

I want to read the data of multiple Start positions at one time through HyperslabSelection, one at a time, how can I do it?

Hdf5_file_7Z.zip

Super issue

before alpha release

  • Repair Little/Big Endian conversion
  • Check if generic reading requires changes to hyperslab selections
  • Ensure that buffer size is checked before copy
  • Improve File.Open signature
  • Add missing API (many properties etc)
  • Parallel testing support or run tests sequentially
  • Package Icon

before beta release

  • Step als readonly record struct
  • new reading API: Read for everything but that means double array = Read<double[]>()
  • Variable length data type (non-string)
  • region references, attribute references
  • Use records whenever possible. They have many advantages. Use "in" parameter (not ref) and ref return in combination with records. See article "Write safe and efficient C# code".
  • Point selections, virtual dataset support
  • Async works but is is not yet fully thread-safe. See AsyncBenchmark for more details. Problematic is task-based benchmark and multi-threaded benchmark.
  • doc string on public members + update github pages
  • ReadOnlySpan2D Microsoft.Toolkit.HighPerformance
  • Add HDF5 comparison table to README and contact LiorBanai (see table below)
  • add netstandard2.0 backard compatibility
  • reenable Blosc test, make Blosc2 PInvoke package run on .NET Framework
  • rename Exists to LinkExists? otherwise something likes this could happen: file.Exists(..) which is very similar to File.Exists();
  • Add support for shared messages (i.e. make OneDAS HDF5 test work). See H5Oshared.c.
  • Why struct contstraint instead of unmanaged? Struct allows string properties why unmanaged does not. Answer: Trying to MemoryMarshal such types results in the runtime exception Only value types without pointers or references are supported. So it is better to directly use the unmanaged constraint.
  • Complete missing API (many properties etc)
  • IQueryable Support (see #2)

before release

  • reading API: also read completely unknown types (most of the code is already implemented), only the API is missing
  • H5Z_FLAG_OPTIONAL
  • SourceGenerator could also generate source for data types, so that users only need to call .Read() to geht a strongly types result
  • Strongly typed fill value should not only be used in virtual datasets but everywhere (e.g. chunk). So in the end all dataset layouts are becoming generic?
  • CommitedDataType should make type information available?
  • Make SimpleChunkCache thread-safe so it can be reused by different threads. Update README.md with the new SimpleChunkCache and remove the SimpleChunkCache warning. Create pretty sample using Pipelines.
  • ulong to uint in Read method?
  • HyperslabSelection with unlimited dims
  • Direct chunk read? Make some code parts replaceable? Interfaces?
  • Bypassing filters
  • https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/
  • Attribute value preview for debugger?
  • Read unknown Enum + docs (return value is string array)
  • Reduce number of H5BinaryReader variables, use BinaryReader instead, especially for local byte arrays. Did not work in the first attempt because some APIs expect main file stream but also local data streams
  • Space & type tostring(), h5py, or debuggerdisplay
  • new C# features: Inlinearrays + Collection Expressions: dotnet/docs#36356
  • Improve allocation of target array: https://learn.microsoft.com/en-us/dotnet/api/system.gc.allocateuninitializedarray?view=net-7.0
    . [ ] HSDS: reenable HSDS tests
  • HSDS: h5path queries: https://github.com/HDFGroup/hsds/blob/18be7801091608ecb119a7bbe29100a7c12a1313/docs/design/query/md_query.md?plain=1#L35 Example http://hsdshdflab.hdfgroup.org/?domain=/shared/tall.h5&h5path=/g1/g1.1/dset1.1.1
  • Checksum support: see jHDF/ChecksumUtils.java
  • Superblock01 DriverInfoBlock: NCSAmulti vs NCSAfami
  • IH5DataProvider ((File)-Driver)
  • hook into filter pipeline
  • ObjectHeader Cache
  • h5coro: Azure vs AWS

performance Optimizations

allocation alternative? ReadOnlySequence to reduce allocations: https://docs.microsoft.com/en-us/dotnet/standard/io/buffers

microsoft/Microsoft.IO.RecyclableMemoryStream: A library to provide pooling for .NET MemoryStream objects to improve application performance.: https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream

missing tests

  • read single chunk (compressed / filtered)
  • skip filter
  • Test DelegateSelection + Documentation
  • do not filter edge chunks
  • add tests with max dims != dims
  • Automatically test against publicly available H5 files
  • test thread-safety of Intel filter Helper

backlog

related:

System.OverflowException

Hi,

I encounter this error :
System.OverflowException at (wrapper managed-to-native) System.Object.__icall_wrapper_ves_icall_array_new_specific(intptr,int)
at PureHDF.VFD.H5StreamDriver.ReadBytes (System.Int32 count) [0x00000] in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VFD/H5StreamDriver.cs:87
at PureHDF.VOL.Native.HeaderMessage..ctor (PureHDF.NativeContext context, System.Byte version, PureHDF.VOL.Native.ObjectHeader objectHeader, System.Boolean withCreationOrder) [0x003a2] in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/Level2A1/HeaderMessage.cs:74 \r\n at PureHDF.VOL.Native.ObjectHeader.ReadHeaderMessages (PureHDF.NativeContext context, System.UInt64 objectHeaderSize, System.Byte version, System.Boolean withCreationOrder) [0x0003d] in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/Level2A1/ObjectHeader.cs:116
at PureHDF.VOL.Native.ObjectHeader1..ctor (PureHDF.NativeContext context, System.Byte version) [0x00068] in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/Level2A1/ObjectHeader1.cs:36
at PureHDF.VOL.Native.ObjectHeader.Construct (PureHDF.NativeContext context) [0x00055] in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/Level2A1/ObjectHeader.cs:81
at PureHDF.NativeNamedReference.Dereference () [0x00058] in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core/NativeNamedReference.cs:57
at PureHDF.VOL.Native.NativeGroup.Get (System.String path, PureHDF.VOL.Native.H5LinkAccess linkAccess) [0x00000] in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core/NativeGroup.cs:90
at PureHDF.VOL.Native.NativeGroup.Get (System.String path) [0x00000] in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core/NativeGroup.cs:80
at PureHDF.IH5GroupExtensions.Group (PureHDF.IH5Group group, System.String path) [0x00000] in /home/runner/work/PureHDF/PureHDF/src/PureHDF/API/IH5GroupExtensions.cs:82
at HDF5Reader.Group (System.String groupPath) [0x00001] in C:\Users\CRJE160\Documents\Git\DragonflyPlayer-ADS\Assets\Scripts\HDF5.NET\HDF5Reader.cs:82

The H5 group I'm reading has a size that exceeds the maximum value of an Int32 (10936 * 5174 * 64 = 3 623 621 248). I think the problem is on StreamReader's ReadBytes methode, this method takes an Int32, not a long / Int64.

Adding writing support?

Would this be feasible in the near future? I love this package, but now my project requires creation too instead of just reading of nwb and hdf5 files.

To Slow reading multiple datasets/group

Hello, I'm trying to read a sequential folder/group in my HDF5 Ex: Group1/Group2/Group3/Datasets and they have the size for example: Size Group1 = 8, Size Group2 = 6, Size Group3 = 400, Size Dataset = 2 doubles values. But basically, since I don't have, for example, a function that can read everything directly from a group, I have to do this code to go through all the groups and this usually takes about 20/30 seconds before I can load everything, because I'm read 2 struct of that type that, which makes a total of 76,800 values to be read. Is there any way/function I can use from the library to reduce the time this reading takes? My code:
image

H5Group.Read<T>() fails with "Filter pipeline failed" in 1.0.0-alpha.21

My aplogies if the following report is a little vague. I'm not sure exactly what info is needed to replicate the issue, but from my perpective all existing files that have been read without issue for some time, are now consistently failing and it isn't immediately clear to me why. It feels like perhaps the new async capability is leading to corrupt reading of byte streams, but I could be way off. I'm logging this info early before I roll up my sleeves and attempt to work out myself what is going on, in case it is obvious to anybody else and/or a fix can be more rapidly fothcoming.

Let me know what other info I can provide to help diagnose the problem. Thanks.

--

In 1.0.0-alpha.20 the following works fine. In 1.0.0-alpha.21 it fails.

var hdf = H5File.OpenRead(path);
var result = hdf.Dataset("key").Read<double>();

Exception:

System.Exception: 'Filter pipeline failed.'
Inner Exception: InvalidDataException: The archive entry was compressed using an unsupported compression method.

Stacktrace:

   at HDF5.NET.H5Filter.ExecutePipeline(List`1 pipeline, UInt32 filterMask, H5FilterFlags flags, Memory`1 filterBuffer, Memory`1 resultBuffer)
   at HDF5.NET.H5D_Chunk.<ReadChunkAsync>d__60`1.MoveNext()
   at HDF5.NET.H5D_Chunk.<ReadChunkAsync>d__59`1.MoveNext()
   at HDF5.NET.SimpleChunkCache.<GetChunkAsync>d__15.MoveNext()
   at HDF5.NET.SelectionUtils.<CopyMemoryAsync>d__2`1.MoveNext()
   at HDF5.NET.H5Dataset.<ReadAsync>d__50`2.MoveNext()
   at HDF5.NET.H5Dataset.Read[T](Selection fileSelection, Selection memorySelection, UInt64[] memoryDims, H5DatasetAccess datasetAccess)
   at ...

Unable to read variable-length attribute

Hello,
reading the variable-length attribute is causing the following exception: Exception: Variable-length sequence data can only be decoded as array (incompatible type: System.String).
The exception says that type System.String is incompatible even though a type string[] has been used.
The code below successfully reads the variable-length type dataset, but reading the variable-length type attribute fails.

            var dataset_NUTS_keys = nativeFile.Dataset("/NUTS_keys");

            foreach (var attribute in dataset_NUTS_keys.Attributes())
            {
                var typeClass = attribute.Type.Class; // attribute.Type.Class is VariableLength

                //This attribute has the name DIMENSION_LIST and should contain the text 'NUTS'
                string[] attributeArray = attribute.Read<string[]>(); // Exception
            }

            var typeClass_NUTS_keys = dataset_NUTS_keys.Type.Class; // Type.Class is VariableLength

            string[] NUTS_keys = dataset_NUTS_keys.Read<string[]>(); // Successful

A copy of the HDF5 file used is here

If attribute.Read<string[][]>() is used the exception is: Bitfield data can only be decoded as NativeObjectReference1 (incompatible type: System.String)
If NativeObjectReference1 is used the exception is: Unable to decode a reference type as value type.

Any help would be greatly appreciated.

Can't read file from pandas library

When I'm try read from pandas python, it return nothing. Whether it relate to schema version of HDF5 ?

Thank you

import numpy as np
import pandas as pd
#%pip install tables -U
import warnings
import os
import time
from tables import NaturalNameWarning
warnings.filterwarnings('ignore', category=NaturalNameWarning)
filePath =r"file.h5"
store = pd.HDFStore(filePath)
store.open()
group  = store.groups()
group

This is testing in cs:

[Test]
    public void TestSaveHdf()
    {
        var file = new H5File()
        {
            ["my-group"] = new H5Group()
            {
                ["numerical-dataset"] = new double[] { 2.0, 3.1, 4.2 },
                ["string-dataset"] = new string[] { "One", "Two", "Three" },
                Attributes = new()
                {
                    ["numerical-attribute"] = new double[] { 2.0, 3.1, 4.2 },
                    ["string-attribute"] = new string[] { "One", "Two", "Three" }
                }
            }
        };
        file.Write("file.h5");
    }

    [Test]
    public void TestReadHdf()
    {
        // root group
        var file = H5File.OpenRead("file.h5");

// sub group
        var group = file.Group("my-group");


// dataset
        var dataset = group.Dataset("numerical-dataset");
        var datasetData = dataset.Read<double[]>();
        foreach (var item in datasetData)
        {
            Console.WriteLine(item);
        }
    }

Field "_dataType" present in the type "HDF5.NET.H5DataType" can be exposed?

Hello,

I have been trying to use HDF5.NET library for reading HDF5 files. I would say that HDF5.NET library is very .NET developer friendly.

We use compound types in our applications, and one application would generate HDF5 files, and other application would need to read the HDF5 files, As far as I could read the documentation, HDF5.NET library requires explicit types that would describe the schema for compound types. But in our case, it is not possible to have a common API for all our applications just to make schema types in sync.

In HDF.PInvoke library, there is a possibility to get member names, datatypes and read the binary data for a compound type data, and let the application handle the reading of compound type, which works for our scenario.

If there is similar option in "HDF5.NET" library, it would be helpful for us. The only thing that is stopping from doing it is that the field "_dataType" present in the type "HDF5.NET.H5DataType" is not exposed. Is it possible to expose the field "_dataType"?

Missing features

  • IH5DataProvider ((File)-Driver)
  • Multithreading (Cache, H5FileReader)
  • Automatically test against publicly available H5 files
  • ObjectHeader Cache
  • ExpandoObject

Filter pipeline improvements v2

Linked to: #33

Pipeline has been improved but memory cannot be rented yet. Problem is that it is impossible to say if the returned memory is a sliced version of the previously rented memory or an independent one. So we cannot simply free the rented memory and all other memories when the method returns. We have to wait until the pipeline finished and that might cause large and useless memory consumption.

unable to read attributes in generic way.

Hi,
I've got a request to read all attribute of a h5 files during iteration of it in my own library (LiorBanai/HDF5-CSharp#163).

I was thinking about leveraging your library since it is better implementation but when I try to read the attributes I get the following exception:
""The fill value message is missing."
image

and others such 'Non-negative number required. (Parameter 'count')'

image

or
image

the file is
hdf5_test.zip

is there a better way to read all attributes of a file?

[Question] reading dynamic compound dataset

Hi,
I was wondering how to read back a compound dataset from existing H5 file without knowing its strucutre (even read it as dictionary of <string,object> is ok) or reading the columns separately.

Do you have example by any chance in your library?

Filter pipeline improvements

1.) Filters like deflate do not know uncompressed size. But if they are the last filter in the pipeline, the size is known by the chunk size.

Also if is the second last filter, the following filter might be shuffle which does not change buffer size. So for deflate it is always better to use the chunk size as guess than to use the input buffer size.

2.) All filters should use MemoryPool instead of new byte[].

3.) The last filter can directly write to the resulting array, save one copy operation.

To enable this there should be a FilterInfo structure with currently present fields + MemoryProvider.

This class would have method GetResultMemory() which normally provides a MemoryPool memory. But if the filter is the last one in pipeline, it returns the sliced result buffer.

ToArray1D: useful?

Hi,
I am new to C# (but practicing Java, C++ for image analysis and electrophysiology) and I am trying to read H5 files storing recordings from 1024 micro-electrodes. These data are stored within an array of "int" with 1028 channels x many points (for ex 300,000) corresponding to the duration of the observation (data are sampled at 20 kHz)).
I would like to extract 1 channel at a time from this array using the hyperslabs approach.

It looks to me that it would be useful to add a ToArray1D as follow:

public static unsafe T[] ToArray1D(this T[] data, long dim0)
where T : unmanaged
{
var dims = new long[] { dim0 };
ArrayExtensions.ValidateInputData(data, dims);
var output = new T[dims[0]];
fixed (void* ptr = output)
{
ArrayExtensions.CopyData(data, ptr);
}
return output;
}

Is there any other way to do this? Right now, I resolved to use ToArray2D but it seems awkward.

Thank you for any help,

Frederic

PS By the way, for a beginner like me, HDF5.NET is the most understandable library and it makes it easy to read all other fields in the H5 file I am working on. Thumbs up for your work!

Unable to resolve dependency

Hi there, I made a LSTM with keras.net.

But since I couldn't set the seed, I decided to use tensroflow.net, the problem is that when I tried to install tensorlfow.keras show me the next error.

Unable to resolve dependency 'PureHDF'. Source(s) used: 'nuget.org', 'Microsoft Visual Studio Offline Packages'.

Anybody have Idea about how to solve this issue?

NBit filter?

Hi,

I am trying to read data that was compressed with Gzip, but I am getting this error:

Exception: The filter 'Nbit' is not yet supported by HDF5.NET.
HDF5.NET.H5Filter.NbitFilterFunc (HDF5.NET.H5FilterFlags flags, System.UInt32[] parameters, System.Memory`1[T] buffer) (at <fb37093370234856bd3792f3c203654a>:0)

Could you help me/guide me towards getting this solved?

Dataset.Read<T>() performance issue

Hello,

I'm creating this "issue" to try and find out how to improve read performance for a complete dataset.

At the moment, I sometimes have to load fairly large files (around 800 MB), and it can take up to twenty minutes to read a complete dataset, even when trying to tweek buffer and chunk sizes in PureHDF.

Do you have any other ideas on how I can improve performance?
I can't use Multi-Threading or asynchronism (my project uses Unity3D and therefore .Net Standard 2.1).

Thanks in advance !

❓ File Extension

Is it possible to do an Extension of a file such that the writer.Dispose() can be called and the data written to file and the continue to write to file using the chunking?

Reading hdf5 HELP

Hi @Apollo3zehn

I would like to use your library to read my .hdf5 file but I am having issue in understanding the "group" or "dataset" or "attribute".

Can you provide a quick example that reference my file?

Thanks,
Marco

image
recorder.zip

Index out of range when trying to read dataset of VariableLength

Hi, I am trying to read a HDF5 file and came across an issue when trying to read a dataset that is VariableLength type.
To read the dataset, i just use the code:
var data = dataset.ReadString();

but it fails with the following exception:

 System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection. 
 (Parameter 'index')
    at PureHDF.H5ReadUtils.ReadString(H5Context context, DatatypeMessage datatype, Span`1 data, String[] result) in 
 /home/runner/work/PureHDF/PureHDF/src/PureHDF/Utils/H5ReadUtils.cs:line 293
    at PureHDF.H5Dataset.ReadString(Selection fileSelection, Selection memorySelection, UInt64[] memoryDims, 
 H5DatasetAccess datasetAccess) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/API/H5Dataset.cs:line 270

It works fine if I read dataset with String type.
May I know what else I need to specify to be able to read the VariableLength?
The attached screenshot is an example of the dataset properties
image

Thank you.

[Question] Reading arrays in compound dataset?

Hi! I'm wondering if there are any examples or support for reading arrays in a compound dataset?

Something like this:

internal struct QUAD_CN
{
    public int ID;
    public string TERM; // length: 8
    public int[] GRID; // length: 5
    public float[] FD1; // length: 5
}

Write support

  • null value handling
  • proper variable length support:
    • GetTypeInfoForVariableLengthSequence: baseEncode(ref memory, item); make use if the return value! this may be important for unmanaged data
    • reference type test (DataspaceMessage) is not yet working
  • IsReferenceOrContainsReferences -> default to true? Otherwise this will cause problems on .NET Standard 2.0
  • check if T[,] should also be supported
  • what about T[,], T[][] and T[,,x], T[][][x]?
  • what is the difference between char[,] and string[]? see String Bit Field Description. It is possible to use UTF-8 strings here since long strings will be truncated and small strings will be padded. Then support for char[,] is not needed? Or does the user decide if it is a fixed-length or variable-length string?
  • Support for new Half datatype and others
  • serializer options for top level string array: set length = fixed size array
  • MemoryMarshal.GetArrayDataElement for 2D extensions
  • Use static attribute message sizes and properly implement the free space manager
  • Support for datasets
    • Filter pipelines
    • Chunking
    • hyperslabs
    • Deferred writing
    • Fill Value (also for attributes??)
    • Add support for Stream -> will also be converted to memory and then written to file.
    • Add support for ReadOnlyMemory (can be casted to memory easily: MemoryMarshal.AsMemory(writeRequest.Data))

backlog

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.