Comments (8)
Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).
__gsd_read_header() returning the same error code for different errors definitely doesn't help. I haven't run it under gdb yet, but I'm guessing it's failing at https://bitbucket.org/glotzer/gsd/src/919a5b1bc3f36cb8e417d823491a3c0b5a94ea16/gsd/gsd.c?at=master&fileviewer=file-view-default#gsd.c-279 . Assuming the files are stored in little-endian order on disk, the read code should swap byte order when reading on big-endian platforms.
from gsd.
Original comment by Joshua Anderson (Bitbucket: joaander, GitHub: joaander).
Thanks for reporting this issue. I do not have access to, nor am I aware of any users that utilize big-endian machines. I did not write gsd to handle cases where users write a file in one byte order and attempt to read it on a machine with another byte order. It would be straightforward to detect this case and return a more specific error code, if that is what you are looking for.
gsd was designed as a minimal binary format that supports named data arrays and that issues a minimal number of file I/O calls to attain good performance when accessing frame data, even in a random order. To achieve these goals, gsd uses write() and read() calls on in memory C-structs, and a single write() (or read()) call for an entire data array. When possible, gsd also memory maps portions of the file to allow the kernel filesystem to efficiently cache random access patterns.
Adding endian conversion support when reading (and/or writing) gsd files would be a non-trivial task. If such support is critical to your work, I would be happy to accept a pull request, as long as it does not introduce any additional overheads or memory allocations for the case where both writers and readers are little-endian. Since there are so few big-endian machines out there, I would not mind adding overheads to that code path. One could use the magic value check you linked to to detect if the file is in reverse byte order and enable the alternate code path.
from gsd.
Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).
Thanks for the quick response. A different error code for this case would certainly help a little.
To give some background for this report, I'm the Fedora package maintainer for MDAnalysis and gsd is a new dependency in 0.17.0 release, so I had to package it, too. Since Fedora builds packages on many arches (apart from x86, we build for Power64 big- and little-endian, ARM, which is little-endian and s390x, which is also big-endian), I need the code to work correctly on all of these.
To whoever wants to implement endian-agnostic reading code, I'd suggest looking at how the FFmpeg project did that at https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/intreadwrite.h . YoLinux has a nice article on this topic, too: http://www.yolinux.com/TUTORIALS/Endian-Byte-Order.html . I might take a stab at implementing this using the htole32() and friends (see man 3 endian), but I'm short on time, so no promises. Also, it would be Linux-specific. Do you care about supporting other operating systems?
from gsd.
Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).
A shorter reproducer for easier testing:
wget https://github.com/MDAnalysis/mdanalysis/raw/develop/testsuite/MDAnalysisTests/data/example.gsd
python2 -c "import gsd.hoomd; gsd.hoomd.open('example.gsd')"
from gsd.
Original comment by Joshua Anderson (Bitbucket: joaander, GitHub: joaander).
I am fine only supporting big endian on linux. However, gsd does need to still compile on windows as there are some windows tools that support the file format (OVITO).
Thanks for explaining what your role is in this, that helps me understand the use-case. I don't know why MDAnalysis didn't just make gsd an optional dependency, it is not a core feature and is a file format only used by a small fraction of MDAnalysis users.
from gsd.
Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).
I pointed out this issue to MDAnalysis developers, too: MDAnalysis/mdanalysis#1829 .
from gsd.
Original comment by Joshua Anderson (Bitbucket: joaander, GitHub: joaander).
I looked into this and it will take a fair amount of work to implement. To simplify development, gsd reads and writes entire structs from/to files. If and when there is a research use-case for this (i.e. there is a national supercomputer center that is big-endian) it is not worth the effort to update gsd.
from gsd.
Original comment by Dominik Mierzejewski (Bitbucket: rathann, GitHub: rathann).
I understand. For what it's worth, I know of at least two supercomputing centres (University of Warsaw's ICM and LLNL in the US) which have IBM BlueGene L/Q clusters and these are Power64 big-endian.
from gsd.
Related Issues (20)
- Typos in gsd.hoomd documentation
- gsd fl module docs are out of date HOT 2
- Deprecate `read_frame`
- Release v2.5.0
- Release v2.5.1
- GSD C API cannot fully handle file paths containing non-English characters on Windows platform HOT 5
- Release v2.5.2
- gsd.hoomd calls nonexistent method to validate snapshot when writing HOT 5
- Release v2.5.3
- gsd allows multiple types of the same name
- Release v2.6.0
- Tests fail: Defining 'pytest_plugins' in a non-top-level conftest is no longer supported HOT 1
- typeid saved incorrectly HOT 1
- Release v2.6.1
- Release v2.7.0
- Read scalar `hoomd` log data into a *pandas* `DataFrame`. HOT 1
- Release v2.8.0
- Release v2.8.1
- Provide high bandwidth performance for bulk frame writes. HOT 6
- Release gsd 2.9.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gsd.