Giter VIP home page Giter VIP logo

Comments (6)

lindstro avatar lindstro commented on July 19, 2024

@martinradev This is expected behavior and is discussed in FAQ 17.

You're essentially asking zfp to represent the single-precision data at an accuracy higher than what IEEE single precision supports. That is, the requested error tolerance is smaller than machine epsilon relative to the value being represented. Although the error of 26 is much larger than the tolerance of 1e-4, the true value is around 2.4e7, so the relative error is around 2.6e1 / 2.4e7 ~ 1e-6, which is close to single precision machine epsilon.

To meet the requested level of accuracy, you need truly lossless compression. That is, the next closest floating-point value to 24161450 is 24161448, which represents an error many orders of magnitude larger than the 1e-4 error tolerance. This example is a bit like asking for a floating-point approximation to pi accurate to 100 digits--it's simply not possible to do so in IEEE floating point.

As of zfp 0.5.5, lossless compression is supported. zfp likely will not do well at all on this particular data set, because it does not represent a continuous scalar function.

from zfp.

lindstro avatar lindstro commented on July 19, 2024

Let me add to this: The largest value (in magnitude) in this data set is around 1.1e10. You would need about 14 digits of accuracy to represent such values to an absolute error tolerance of 1e-4. Single precision is accurate to about 6 digits, while double precision is accurate to about 15 digits. Thus, any error tolerance smaller than 1e10 / 1e6 = 1e4 likely cannot be respected for this single-precision data set.

As a rule of thumb, the error tolerance should not be smaller than 1e-6 * maxval for single precision and 1e-15 * maxval for double precision, where maxval is the largest in magnitude value in the data set. Note that the original values were likely rounded and are not more accurate than this anyway.

from zfp.

martinradev avatar martinradev commented on July 19, 2024

I would have expected that the compressor can determine the range [-L,L] for which it is safe to discard data. ZFP could handle the situation by not discarding information for values outside of the range.

If I would specify a huge absolute error so that large values are handled correctly by zfp, what prevents zfp from representing values close to 0 as 0s?

The SZ compressor does not introduce the same errors which means it likely handles the situation correctly.

from zfp.

lindstro avatar lindstro commented on July 19, 2024

zfp really should be thought of as a different numerical representation of the reals rather than as a compressor of IEEE-754 (single- or double-precision) floating-point values. For finite and fixed precision (or storage), the sets of reals representable by IEEE and zfp differ. There are IEEE values that zfp cannot represent, just as there are reals that zfp can represent but IEEE cannot. In this sense, zfp will meet a prescribed error tolerance when compressing IEEE floating-point values as long as there is a zfp representation of a given IEEE floating-point value within that tolerance. As you have correctly observed, there are cases where there is no zfp representation within the prescribed error tolerance when the tolerance is extremely small in relation to the values in a given IEEE data set (e.g., even smaller than satisfiable by IEEE).

As an analogy, take 32-bit two's complement signed integers as another discrete representation of the reals. Clearly, there are lots of 32-bit floats that fall between consecutive integers, but there are also lots of 32-bit integers that fall between consecutive floats. For instance, there are 255 integers {2^31 + 1, 2^31 + 2, ..., 2^31 + 255} that fall between the two consecutive floats 2^31 and 2^31 + 256 (i.e., IEEE single precision does not support any values between these two reals). Converting 2^31 + 128 to IEEE float will incur an error of 128. One could ask IEEE float to represent values within an error tolerance of 0.5 using the rint() function. This tolerance will be respected for values in the range [-2^24, 2^24], but will be silently ignored for large enough integers (such as 2^31 + 128). Conversion from IEEE-754 to zfp is similar in that a tolerance small enough simply cannot be satisfied. This is true of all number formats (not just zfp).

Posits are yet another representation of the reals that supports a subset of the reals different from IEEE. Converting numbers in either direction (between IEEE and Posits) will in many cases introduce errors. Blaming one representation over the other for such conversion errors does not make much sense--the number formats merely support two different subsets of the reals. For instance, one could claim that IEEE is a "poor" representation for not being able to accurately represent all reals representable by zfp, but that would be just as unfair as blaming zfp for not representing all IEEE values to within some arbitrarily tolerance.

zfp could in principle represent all floats exactly, but that would require more than 32 bits of precision/storage. Toward this end, one could first cast the floats to doubles and then use zfp's double-precision CODEC, which uses 64 bits of precision. That is still not a guarantee for all inputs, but it would satisfy the error tolerance of 1e-4 for the particular data set you're using (i.e., maxe < 1e-4):

type=double nx=33298679 ny=1 nz=1 nw=1 raw=266389432 zfp=76897600 ratio=3.46 rate=18.47 rmse=9.942e-06 nrmse=8.988e-16 maxe=3.731e-05 psnr=294.91

In some cases, even higher precision is needed, just like IEEE would require a very high or even infinite precision to represent certain reals like pi, e, sqrt(2), etc. That's a direct consequence of any number format representing at most 2^p distinct reals at p bits of precision.

Unlike zfp, the SZ compressor was designed specifically to compress IEEE floating-point values to within a user-specified tolerance. If you set the tolerance to something small enough, SZ will respect it, but it will then often expand (not compress) the data. Similarly, zfp's reversible mode will exactly represent any IEEE data set, but there is no guarantee that it will not expand the data set (due to the pigeonhole principle).

zfp was designed to support very fast conversion between formats and inline rather than offline compression. Hence, zfp's tolerance-based compression mode is an afterthought rather than a design criterion. If representing IEEE single-precision data to within a tolerance smaller than supported by the single-precision format itself is a requirement (which arguably does not make much sense given that such tolerances have already been exceeded when rounding to float in the first place), then I would suggest consulting a lossless floating-point compressor (zfp being one of them; other candidates are fpzip, FPC, SPDP, BLOSC, and others).

Future versions of zfp will accept any representation of the reals as input and respect error tolerances as long as they can be satisfied (in the above sense). In other words, there won't be separate zfp representations of int32, int64, float32, float64, etc.

Finally, zfp does not represent individual reals like ints, floats, and posits do. Rather, zfp represents vectors of length 4^d reals, where d is the underlying data dimensionality (1 <= d <= 4). Nevertheless, the arguments above still apply, but with reals replaced with short vectors of reals.

from zfp.

martinradev avatar martinradev commented on July 19, 2024

Thank you for you detailed answer, Peter.

So, the cause is the transformation to Q3.60 and block transform in zfp?
The absolute error is still preserved but in the frequency domain, not in the original domain. Once the data is converted back to the original domain, considering infinite precision the error would have stayed within the user-specified threshold. However, since the representable range of floating point values is sparse, for very large numbers the decompressed value might end up being not the original but the next closest which could be at a great distance - in the case above 26.0.

I think I can close the issue. Thank you for your time. I learned something!

from zfp.

lindstro avatar lindstro commented on July 19, 2024

You're essentially right. Perhaps a more accurate characterization is that the conversion between IEEE and zfp is not one-to-one. Going back to the integer vs. float analogy, multiple floats round to the same nearest integer (hence the float-to-int conversion is not injective). For example, all floats in the interval [1.5, 2.5] round to the integer 2. When then converting the integer back to float, only one of the original floats is recovered (2.0 in this case). If the error tolerance is set to a value smaller than 0.5, then it will be violated when converting 1.5 -> 2 -> 2.0.

The same happens in the opposite direction, e.g., integers 2^31 + 129 through 2^31 + 383 all map to the same floating-point value 2^31 + 256. If the error tolerance is smaller than 127, then it will be violated for some integers.

Clearly, it is not possible to exactly--or to within an arbitrarily small tolerance--recover all values when the transform is not injective. To make the transform injective, more precision is needed for the range (i.e., the zfp representation) of the transform. Using adversarial choices, one can construct cases where the precision needs to be not only higher than 32 bits (for zfp compression of floats), but even higher than 64 bits, which would be impractical.

zfp was originally designed with the assumption that errors much larger than double-precision epsilon (but far smaller than single-precision epsilon) can be tolerated, i.e., to be more accurate than floats but using less storage. Therefore, there is no support for absolute error tolerances smaller than double-precision epsilon.

As argued previously, the floating-point values are already contaminated with round-off error at least as large as double-precision epsilon, and typically much larger due to other sources of error (truncation error, iteration error, measurement error, model error, ...).

Regarding potential sources of error in the zfp compressor, there are three of them. One is in the final bit plane truncation step. When the error tolerance is small enough, zfp will not discard any bits in this step, which will then not introduce any error.

The transformation to zfp's block-floating-point representation is then the largest source of error. This representation has significantly higher precision than IEEE (30/62 bits of mantissa vs. 24/53 for IEEE) but a lower dynamic range within a block due to the use of a single exponent for the whole block. This initial conversion step is the limiting factor in terms of what absolute error tolerance can be satisfied.

The final source of error is in the decorrelating block transform, which makes use of integer right shifts by one bit. For blocks with very high dynamic range, it is possible for a least significant one-bit to be shifted out and lost.

One may compose the block-floating-point transform with the decorrelating transform and then analyze the error resulting from this composite transform. See our SIAM SISC paper for further details.

from zfp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.