Giter VIP home page Giter VIP logo

Comments (8)

joaander avatar joaander commented on June 13, 2024

Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).


__gsd_read_header() returning the same error code for different errors definitely doesn't help. I haven't run it under gdb yet, but I'm guessing it's failing at https://bitbucket.org/glotzer/gsd/src/919a5b1bc3f36cb8e417d823491a3c0b5a94ea16/gsd/gsd.c?at=master&fileviewer=file-view-default#gsd.c-279 . Assuming the files are stored in little-endian order on disk, the read code should swap byte order when reading on big-endian platforms.

from gsd.

joaander avatar joaander commented on June 13, 2024

Original comment by Joshua Anderson (Bitbucket: joaander, GitHub: joaander).


Thanks for reporting this issue. I do not have access to, nor am I aware of any users that utilize big-endian machines. I did not write gsd to handle cases where users write a file in one byte order and attempt to read it on a machine with another byte order. It would be straightforward to detect this case and return a more specific error code, if that is what you are looking for.

gsd was designed as a minimal binary format that supports named data arrays and that issues a minimal number of file I/O calls to attain good performance when accessing frame data, even in a random order. To achieve these goals, gsd uses write() and read() calls on in memory C-structs, and a single write() (or read()) call for an entire data array. When possible, gsd also memory maps portions of the file to allow the kernel filesystem to efficiently cache random access patterns.

Adding endian conversion support when reading (and/or writing) gsd files would be a non-trivial task. If such support is critical to your work, I would be happy to accept a pull request, as long as it does not introduce any additional overheads or memory allocations for the case where both writers and readers are little-endian. Since there are so few big-endian machines out there, I would not mind adding overheads to that code path. One could use the magic value check you linked to to detect if the file is in reverse byte order and enable the alternate code path.

from gsd.

joaander avatar joaander commented on June 13, 2024

Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).


Thanks for the quick response. A different error code for this case would certainly help a little.

To give some background for this report, I'm the Fedora package maintainer for MDAnalysis and gsd is a new dependency in 0.17.0 release, so I had to package it, too. Since Fedora builds packages on many arches (apart from x86, we build for Power64 big- and little-endian, ARM, which is little-endian and s390x, which is also big-endian), I need the code to work correctly on all of these.

To whoever wants to implement endian-agnostic reading code, I'd suggest looking at how the FFmpeg project did that at https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/intreadwrite.h . YoLinux has a nice article on this topic, too: http://www.yolinux.com/TUTORIALS/Endian-Byte-Order.html . I might take a stab at implementing this using the htole32() and friends (see man 3 endian), but I'm short on time, so no promises. Also, it would be Linux-specific. Do you care about supporting other operating systems?

from gsd.

joaander avatar joaander commented on June 13, 2024

Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).


A shorter reproducer for easier testing:

wget https://github.com/MDAnalysis/mdanalysis/raw/develop/testsuite/MDAnalysisTests/data/example.gsd
python2 -c "import gsd.hoomd; gsd.hoomd.open('example.gsd')"

from gsd.

joaander avatar joaander commented on June 13, 2024

Original comment by Joshua Anderson (Bitbucket: joaander, GitHub: joaander).


I am fine only supporting big endian on linux. However, gsd does need to still compile on windows as there are some windows tools that support the file format (OVITO).

Thanks for explaining what your role is in this, that helps me understand the use-case. I don't know why MDAnalysis didn't just make gsd an optional dependency, it is not a core feature and is a file format only used by a small fraction of MDAnalysis users.

from gsd.

joaander avatar joaander commented on June 13, 2024

Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).


I pointed out this issue to MDAnalysis developers, too: MDAnalysis/mdanalysis#1829 .

from gsd.

joaander avatar joaander commented on June 13, 2024

Original comment by Joshua Anderson (Bitbucket: joaander, GitHub: joaander).


I looked into this and it will take a fair amount of work to implement. To simplify development, gsd reads and writes entire structs from/to files. If and when there is a research use-case for this (i.e. there is a national supercomputer center that is big-endian) it is not worth the effort to update gsd.

from gsd.

joaander avatar joaander commented on June 13, 2024

Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).


I understand. For what it's worth, I know of at least two supercomputing centres (University of Warsaw's ICM and LLNL in the US) which have IBM BlueGene L/Q clusters and these are Power64 big-endian.

from gsd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.