Comments (4)
That is indeed intentional. Strides are a property of the organization of the data in memory as the data is compressed and written, with the compressed output being organized differently. Consumers do not necessarily want to maintain those strides when later reading and decompressing the data. Requiring that could, for instance, blow up memory requirements for the consumer if the original data is not stored contiguously. As an example, the original layout could be in array-of-struct form, perhaps with dozens of different fields being written one at a time by a simulation code. If later data analysis is to be done on a single field, you don't want to have to recreate the original layout that wastes storage on all but one field.
If the same data layout is desired during decompression, then that can be accomplished by setting strides during decompression, though the strides would have to be maintained separately.
from zfp.
Thanks for elaborated answer. The scenario i am interested in is compression of 2d vector field as two strided 2d scalar fields. I believe that in such scenario i am bound to either manually (de)interleave vector components into separate planes prior/after compression/decompression or to rely on zfp's internal accounting for strides. As you suggested, storing strides externally and providing them during decompression seems to be a solution, although, i would argue that deriving all the meta required to setup decompression directly from a header would be more convenient.
A bit tangential: what is rationale for not directly storing all but data pointer fields of zfp_field in header? It seems that cost of such header is negligible wrt compressed stream itself.
from zfp.
As mentioned above, one rationale is that the consumer may not want to organize the data the same way the producer does. In fact, I cannot think of a case where the consumer, which processes the data, does not know how it wants the data to be organized. Can you think of a scenario where it would be beneficial to have the producer dictate the data layout for the consumer? In the case of a code processing 2D vector fields, the consumer needs to know if the data layout is float field[ny][nx][2]
or float field[2][ny][nx]
(or some other permutation) so it can index the multidimensional field properly. If the code is written with one of these conventions, it will fail if the data producer mandates the other convention. While one can write such a code using strides (e.g. field[stride_x * x + stride_y * y + stride_c * c]
to access vector component c at (x, y)), oftentimes you want to use some container class like a NumPy array whose strides are given by the container, not the data producer.
Another rationale is that we have gone to great lengths to make the storage of metadata and compression parameters as compact as possible; in most cases, we encode array dimensions, scalar type, and compression mode and parameters in only 64 bits. This compact encoding is motivated by zfp's unique approach to representing large arrays as a collection of very small blocks (consisting of 4d values in d dimensions) that can be (de)compressed independently. We early on anticipated the potential to vary compression parameters spatially, perhaps even from one block to the next, and in that case the overhead of storing compression parameters becomes large. Similarly, in certain applications (like AMR), one may form a larger grid as a collection of smaller ones, with each subgrid composed of a small collection of zfp blocks. In this case, it is again important to keep array metadata per subgrid small. One may even vary precision spatially (e.g., float vs. double), where again you need an efficient way of encoding scalar type. Whereas individual array dimensions are often small (say, 16 bits or less), strides are not only signed but may span the product of all dimensions or even more (when multiple fields are interleaved), making them far costlier to encode. In practice, you often need more than 32 bits per stride, or more than 96 bits for the 2D vector field example above.
Now, I can envision a case where the consumer (perhaps an I/O module) is tasked only with reconstructing the original data bit for bit. Using the current zfp API, it would be possible to add a new ZFP_HEADER
tag for strides to also store this information. The consumer could then override the strides set in zfp_read_header()
before calling zfp_decompress()
. The main challenge would be to do this in a backwards compatible manner as one would presumably have to redefine ZFP_HEADER_FULL
to also include strides, and that would break existing code. But it may be reasonable to consider such a feature for future versions of the zfp codec. There are other changes to the compressed format we would want to incorporate, but a change to the codec will not happen anytime soon.
from zfp.
We early on anticipated the potential to vary compression parameters spatially, perhaps even from one block to the next, and in that case the overhead of storing compression parameters becomes large.
The need to design for such use-case answers my question, thanks.
Can you think of a scenario where it would be beneficial to have the producer dictate the data layout for the consumer?
Now, I can envision a case where the consumer (perhaps an I/O module) is tasked only with reconstructing the original data bit for bit.
This is pretty much the case for my usage scenario.
from zfp.
Related Issues (20)
- oldest-supported-numpy Erroneously Required in Apple Silicon Python 3.10 Wheel HOT 30
- Python binding installation fails HOT 18
- ppc32: index.hpp: error: integer constant is too large for ‘unsigned long’ type HOT 3
- How to run zfp as a c code with gcc HOT 6
- Is psnr formula error in ZFP? HOT 2
- Missing `void` in functions that take no parameters
- Tiny but above subnormal numbers not handled correctly
- Numpy 2.0 HOT 11
- Cuda mode ZFP examples HOT 4
- Compilation issues with zfp HOT 2
- Running issues with zfp and cuda HOT 3
- Errors reported with scikit-build when running CMake with BUILD_ZFPY enabled HOT 1
- Test failures and segfaults when building with Visual Studio 2022 compiler v14.38.33130 HOT 11
- 2D Point data curiosly compresses way better at 1D HOT 4
- Deserializing const_array HOT 2
- Issue with running the diffusion program in the examples folder HOT 3
- Syntax error in setup.py HOT 1
- Adjusting test cases from mock tests to proper tests for `testZfpCuda4d<datatype>` HOT 3
- example array.cpp error:C4146 Unary minus operator applied to unsigned type, the result is still unsigned. store.hpp 241 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zfp.