Giter VIP home page Giter VIP logo

dragonbox's Introduction

Dragonbox

This library is a reference implementation of Dragonbox in C++.

Dragonbox is a float-to-string conversion algorithm based on a beautiful algorithm Schubfach, developed by Raffaello Giulietti in 2017-2018. Dragonbox is further inspired by Grisu and Grisu-Exact.

Introduction

Dragonbox generates a pair of integers from a floating-point number: the decimal significand and the decimal exponent of the input floating-point number. These integers can then be used for string generation of decimal representation of the input floating-point number, the procedure commonly called ftoa or dtoa.

The algorithm guarantees three things:

  1. It has the roundtrip guarantee; that is, a correct parser interprets the generated output string as the original input floating-point number. (See here for some explanation on this.)

  2. The output is of the shortest length; that is, no other output strings that are interpreted as the input number can contain less number of significand digits than the output of Dragonbox.

  3. The output is correctly rounded: the number generated by Dragonbox is the closest to the actual value of the input number among possible outputs of minimum number of digits.

About the Name "Dragonbox"

The core idea of Schubfach, which Dragonbox is based on, is a continuous analogue of discrete pigeonhole principle. The name Schubfach is coming from the German name of the pigeonhole principle, Schubfachprinzip, meaning "drawer principle". Since another name of the pigeonhole principle is Dirichlet's box principle, I decided to call my algorithm "Dragonbox" to honor its origins: Schubfach (box) and Grisu (dragon).

How to Use

Although Dragonbox is intended for float-to-string conversion routines, the actual string generation is not officially a part of the algorithm. Dragonbox just outputs two integers (the decimal significand/exponent) that can be consumed by a string generation procedure. The header file include/dragonbox/dragonbox.h includes everything needed for this (it is header-only). Nevertheless, a string generation procedure is included in the library. There are two additional files needed for that: include/dragonbox/dragonbox_to_chars.h and source/dragonbox_to_chars.cpp. Since there are only three files, it should be not difficult to set up this library manually if you want, but you can also use it via CMake as explained below. If you are not familiar with CMake, I recommend you to have a look at this wonderful introduction.

Installing Dragonbox

The following will create platform-specific build files on your directory:

git clone https://github.com/jk-jeon/dragonbox
cd dragonbox
mkdir build
cd build
cmake ..

If you only want dragonbox.h but not dragonbox_to_chars.h/.cpp, you can do the following to install dragonbox.h into your system:

cmake .. -DDRAGONBOX_INSTALL_TO_CHARS=OFF
cmake --install .

If you want the string generation part as well, build the generated files using platform-specific build tools (make or Visual Studio for example) and then perform

cmake --install .

on the build directory.

Including Dragonbox into CMake project

The easiest way to include Dragonbox in a CMake project is to do the following:

include(FetchContent)
FetchContent_Declare(
        dragonbox
        GIT_REPOSITORY https://github.com/jk-jeon/dragonbox
)
FetchContent_MakeAvailable(dragonbox)
target_link_libraries(my_target dragonbox::dragonbox) # or dragonbox::dragonbox_to_chars

Or, if you already have installed Dragonbox in your system, you can include it with:

find_package(dragonbox)
target_link_libraries(my_target dragonbox::dragonbox) # or dragonbox::dragonbox_to_chars

Language Standard

The library is targeting C++17 and actively using its features (e.g., if constexpr).

Usage Examples

(Simple string generation from float/double)

#include "dragonbox/dragonbox_to_chars.h"
constexpr int buffer_length = 1 + // for '\0'
  jkj::dragonbox::max_output_string_length<jkj::dragonbox::ieee754_binary64>;
double x = 1.234;  // Also works for float
char buffer[buffer_length];

// Null-terminate the buffer and return the pointer to the null character
// Hence, the length of the string is (end_ptr - buffer)
// buffer is now { '1', '.', '2', '3', '4', 'E', '0', '\0', (garbages) }
char* end_ptr = jkj::dragonbox::to_chars(x, buffer);

// Does not null-terminate the buffer; returns the next-to-end pointer
// buffer is now { '1', '.', '2', '3', '4', 'E', '0', (garbages) }
// you can wrap the buffer with things like std::string_view
end_ptr = jkj::dragonbox::to_chars_n(x, buffer);

(Direct use of jkj::dragonbox::to_decimal)

#include "dragonbox/dragonbox.h"
double x = 1.234;   // Also works for float

// Here, x should be a nonzero finite number!
// The return value v is a struct with three members:
// significand : decimal significand (1234 in this case);
//               it is of type std::uint64_t for double, std::uint32_t for float
//    exponent : decimal exponent (-3 in this case); it is of type int
// is_negative : as the name suggests; it is of type bool
auto v = jkj::dragonbox::to_decimal(x);

By default, jkj::dragonbox::to_decimal returns a struct with three members (significand, exponent, and is_negative). But the return type and the return value can change if you specify policy parameters. See below.

Important. jkj::dragonbox::to_decimal is designed to work only with finite nonzero inputs. The behavior of it when given with infinities/NaN's/+0/-0 is undefined. jkj::dragonbox::to_chars and jkj::dragonbox::to_chars_n work fine for any inputs.

Policies

Dragonbox provides several policies that the user can select. Most of the time the default policies will be sufficient, but for some situation this customizability might be useful. There are currently five different kinds of policies that you can specify: sign policy, trailing zero policy, decimal-to-binary (parsing) rounding policy, binary-to-decimal (formatting) rounding policy, and cache policy. Those policies live in the namespace jkj::dragonbox::policy. You can provide the policies as additional parameters to jkj::dragonbox::to_decimal or jkj::dragonbox::to_chars or jkj::dragonbox::to_chars_n. Here is an example usage:

#include "dragonbox/dragonbox.h"
auto v = jkj::dragonbox::to_decimal(x,
    jkj::dragonbox::policy::sign::ignore,
    jkj::dragonbox::policy::cache::compact);

In this example, the ignore sign policy and the compact cache policy are specified. The return value will not include the member is_negative, and jkj::dragonbox::to_decimal will internally use the compressed cache for the computation, rather than the full cache. There is no particular order for policy parameters; you can give them in any order. Default policies will be chosen if you do not explicitly specify any. In the above example, for instance, nearest_to_even decimal-to-binary rounding mode policy is chosen, which is the default decimal-to-binary rounding mode policy. If you provide two or more policies of the same kind, or if you provide an invalid policy parameter, then the compliation will fail.

Policy parameters (e.g., jkj::dragonbox::policy::sign::ignore in the above example) are of different types, so different combinations of policies generally result in separate template instantiations, which might cause binary bloat. (However, it is only the combination that matters; giving the same parameter combination in a different order will usually not generate a separate binary.)

Sign policy

Determines whether or not jkj::dragonbox::to_decimal will extract and return the sign of the input parameter.

  • jkj::dragonbox::policy::sign::ignore: There is no is_negative member in the returned struct and the sign of the input is not returned. A string generation routine might anyway need to deal with the sign by itself, so often this member will not be needed. In that case, omitting is_negative member can reduce some overhead. jkj::dragonbox::to_chars and jkj::dragonbox::to_chars_n use this policy internally. In the implementation of jkj::dragonbox::to_decimal, the sign of the input is relevant only for deciding the rounding interval under certain rounding mode policies. Under the default rounding mode policies, the sign is completely ignored.
  • jkj::dragonbox::policy::sign::return_sign: This is the default policy. The sign of the input will be written in the is_negative member of the returned struct.

You cannot specify sign policy to jkj::dragonbox::to_chars/jkj::dragonbox::to_chars_n.

Trailing zero policy

Determines what jkj::dragonbox::to_decimal will do with possible trailing decimal zeros.

  • jkj::dragonbox::policy::trailing_zero::ignore: Do not care about trailing zeros; the output significand may contain trailing zeros. Since trailing zero removal is a relatively heavy operation involving lots of divisions, and a string generation routine will need to perform divisions anyway, it would be possible to get a better overall performance by omitting trailing zero removal from jkj::dragonbox::to_decimal and taking care of that in other places.
  • jkj::dragonbox::policy::trailing_zero::remove: This is the default policy. Remove all trailing zeros in the output. jkj::dragonbox::to_chars and jkj::dragonbox::to_chars_n use this policy internally for IEEE-754 binary32 format (aka float).
  • jkj::dragonbox::policy::trailing_zero::report: The output significand may contain trailing zeros, but such possibility will be reported in the additional member may_have_trailing_zeros of the returned struct. This member will be set to true if there might be trailing zeros, and it will be set to false if there should be no trailing zero. By how the algorithm works, it is guaranteed that whenever there might be trailing zeros, the maximum number of trailing zeros is 7 for binary32 and 15 for binary64.

You cannot specify trailing zero policy to jkj::dragonbox::to_chars/jkj::dragonbox::to_chars_n.

Decimal-to-binary rounding policy

Dragonbox provides a roundtrip guarantee. This means that if we convert the output of Dragonbox back to IEEE-754 binary floating-point format, the result should be equal to the original input to Dragonbox. However, converting the decimal output of Dragonbox back into binary floating-point number requires a rounding, so in order to ensure the roundtrip guarantee, Dragonbox must assume which kind of rounding will be performed for the inverse, decimal-to-binary conversion.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_to_even: This is the default policy. Use round-to-nearest, tie-to-even rounding mode.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_to_odd: Use round-to-nearest, tie-to-odd rounding mode.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_toward_plus_infinity: Use round-to-nearest, tie-toward-plus-infinity rounding mode.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_toward_minus_infinity: Use round-to-nearest, tie-toward-minus-infinity rounding mode.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_toward_zero: Use round-to-nearest, tie-toward-zero rounding mode. This will produce the fastest code among all round-to-nearest rounding modes.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_away_from_zero: Use round-to-nearest, tie-away-from-zero rounding mode.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_to_even_static_boundary: Use round-to-nearest, tie-to-even rounding mode, but there will be completely independent code paths for even inputs and odd inputs. This will produce a bigger binary, but might run faster than jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_to_even for some situation.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_to_odd_static_boundary: Use round-to-nearest, tie-to-odd rounding mode, but there will be completely independent code paths for even inputs and odd inputs. This will produce a bigger binary, but might run faster than jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_to_odd for some situation.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_toward_plus_infinity_static_boundary: Use round-to-nearest, tie-toward-plus-infinity rounding mode, but there will be completely independent code paths for positive inputs and negative inputs. This will produce a bigger binary, but might run faster than jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_toward_plus_infinity for some situation.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_toward_minus_infinity_static_boundary: Use round-to-nearest, tie-toward-plus-infinity rounding mode, but there will be completely independent code paths for positive inputs and negative inputs. This will produce a bigger binary, but might run faster than jkj::dragonbox::policy::decimal_to_binary_rounding::nearest_toward_minus_infinity for some situation.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::toward_plus_infinity: Use round-toward-plus-infinity rounding mode.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::toward_minus_infinity: Use round-toward-minus-infinity rounding mode.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::toward_zero: Use round-toward-zero rounding mode.

  • jkj::dragonbox::policy::decimal_to_binary_rounding::away_from_zero: Use away-from-zero rounding mode.

All of these policies can be specified also to jkj::dragonbox::to_chars/jkj::dragonbox::to_chars_n.

Binary-to-decimal rounding policy

Determines what jkj::dragonbox::to_decimal will do when rounding tie occurs while obtaining the decimal significand. This policy will be completely ignored if the specified binary-to-decimal rounding policy is not one of the round-to-nearest policies (because for other policies rounding tie simply doesn't exist).

  • jkj::dragonbox::policy::binary_to_decimal_rounding::do_not_care: Do not care about correct rounding at all and just find any shortest output with the correct roundtrip. It will produce a faster code, but the performance difference will not be big.
  • jkj::dragonbox::policy::binary_to_decimal_rounding::to_even: This is the default policy. Choose the even number when rounding tie occurs.
  • jkj::dragonbox::policy::binary_to_decimal_rounding::to_odd: Choose the odd number when rounding tie occurs.
  • jkj::dragonbox::policy::binary_to_decimal_rounding::away_from_zero: Choose the number with the bigger absolute value when rounding tie occurs.
  • jkj::dragonbox::policy::binary_to_decimal_rounding::toward_zero: Choose the number with the smaller absolute value when rounding tie occurs.

All of these policies can be specified also to jkj::dragonbox::to_chars/jkj::dragonbox::to_chars_n.

Cache policy

Choose between the full cache table and the compressed one. Using the compressed cache will result in about 20% slower code, but it can significantly reduce the amount of required static data. It currently has no effect for binary32 (float) inputs. For binary64 (double) inputs, jkj::dragonbox::cache_policy::full will cause jkj::dragonbox::to_decimal to use 619*16 = 9904 bytes of static data table, while the corresponding amount for jkj::dragonbox::cache_policy::compact is 23*16 + 27*8 = 584 bytes.

  • jkj::dragonbox::policy::cache::full: This is the default policy. Use the full table.
  • jkj::dragonbox::policy::cache::compact: Use the compressed table.

All of these policies can be specified also to jkj::dragonbox::to_chars/jkj::dragonbox::to_chars_n.

Performance

In my machine (Intel Core i7-7700HQ 2.80GHz, Windows 10), it defeats or is on par with other contemporary algorithms including Grisu-Exact, Ryu, and Schubfach.

The following benchmark result (performed on 03/30/2024) is obtained using Milo's dtoa benchmark framework (https://github.com/miloyip/dtoa-benchmark). The complete source code for the benchmark below is available here.

corei7_7700hq@2.80_win64_vc2019_randomdigit_time corei7_7700hq@2.80_win64_vc2019_randomdigit_time

Note 1: dragonbox is the performance of Dragonbox with the full cache table, and dragonbox_comp is the performance of Dragonbox with the compact cache table.

Note 2: fmt internally uses Dragonbox with an implementation almost identical to that in this repository.

There is also a benchmark done by myself (also performed on 03/30/2024):

(top: benchmark for float data, bottom: benchmark for double data; solid lines are the averages, dashed lines are the medians, and the shaded regions show 30%, 50%, and 70% percentiles):

(Clang) digits_benchmark_binary32 digits_benchmark_binary64

(MSVC) digits_benchmark_binary32 digits_benchmark_binary64

Here is another performance plot with uniformly randomly generated float(top) or double(bottom) data:

(Clang) uniform_benchmark_binary32 uniform_benchmark_binary64

(MSVC) uniform_benchmark_binary32 uniform_benchmark_binary64

(Note: the comparison with Schubfach is not completely fair, since the implementation I benchmarked against uses a digit generation procedure with a different set of constraints. More fair comparison is available in this repository.)

Comprehensive Explanation of the Algorithm

Please see this paper.

How to Run Tests, Benchmark, and Others

There are four subprojects contained in this repository:

  1. common: The subproject that other subprojects depend on.
  2. benchmark: Runs benchmark.
  3. test: Runs tests.
  4. meta: Generates static data that the main library uses.

Build each subproject independently

All subprojects including tests and benchmark are standalone, which means that you can build and run each of them independently. For example, you can do the following to run tests:

git clone https://github.com/jk-jeon/dragonbox
cd dragonbox
mkdir -p build/subproject/test
cd build/subproject/test
cmake ../../../subproject/test
cmake --build .
ctest .

(You might need to pass the configuration option to cmake and ctest if you use multi-configuration generators like Visual Studio.)

Build all subprojects from the root directory

It is also possible to build all subprojects from the root directory by passing the option -DDRAGONBOX_ENABLE_SUBPROJECT=On to cmake:

git clone https://github.com/jk-jeon/dragonbox
cd dragonbox
mkdir build
cd build
cmake .. -DDRAGONBOX_ENABLE_SUBPROJECT=On
cmake --build .

Notes on working directory

Some executable files require the correct working directory to be set. For example, the executable for benchmark runs some MATLAB scripts provided in subproject/benchmark/matlab directory, which will fail to execute if the working directory is not set to subproject/benchmark. If you use the provided CMakeLists.txt files to generate a Visual Studio solution, the debugger's working directory is automatically set to the corresponding source directory. For example, the working directory is set to subproject/benchmark for the benchmark subproject. However, other generators of cmake are not able to set the debugger's working directory, so in that case you need to manually set the correct working directory when running the executables in order to make them work correctly.

Notes

Correctness of the algorithm

The paper provides a mathematical proof of the correctness of the algorithm, with the aid of verification programs in test and meta directories. In addition to that, I did a fair amount of uniformly random tests against Ryu (which is extremely heavily tested in its own), and I also ran a joint test of Dragonbox with a binary-to-decimal floating-point conversion routine I developed, and confirmed correct roundtrip for all possible IEEE-754 binary32-encoded floating-point numbers (aka float) with the round-to-nearest, tie-to-even rounding mode. Therefore, I am pretty confident about the correctness of both of the algorithms.

Precise meaning of roundtrip guarantee

The precise meaning of roundtrip guarantee might be tricky, as it depends on the notion of "correct parsers". For example, given that significand and exponent are the outputs of Dragonbox with respect to an input floating-point number x of, say, type double, then things like x == significand * pow(10.0, exponent) might or might not be the case, because each of the floating-point operations in the expression significand * pow(10.0, exponent) can introduce rounding errors that can accumulate to a bigger error. What a correct parser should do is to precisely compute the floating-point number from the given expression according to the assumed rounding rule, and the result must be "correctly rounded" in the sense that only the minimum possible rounding error is allowed. Implementing a correct parser is indeed a very nontrivial job, so you may need additional libraries (like Ryu or double-conversion) if you want to check this roundtrip guarantee by yourself.

License

All code, except for those belong to third-party libraries (code in subproject/3rdparty), is licensed under either of

dragonbox's People

Contributors

alexhuszagh avatar friendlyanon avatar jk-jeon avatar johnmcfarlane avatar jwakely avatar krnowak avatar leni536 avatar rodericktaylor avatar seanmiddleditch avatar striezel avatar tobybell avatar yotann avatar yumeyao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dragonbox's Issues

Uninitialized result structure can return incorrect value in may_have_trailing_zeros

It seems that the result structure is never initialized, which means that the may_have_trailing_zeros can return whatever was on the stack if none of on_trailing_zeros or no_trailing_zeros functions are called.

I have this issue in a test program I made to compare my C port of the reference code. It happens with the following values, which returns true (in fact it's not 'true' but '5') for may_have_trailing_zeros in my test code (which means that you might not get the same result with a simple call with those values). That value comes from the compute_nearest_normal function on line 1799 where ret_value is declared but not initialized. The field is never written to after that.

using namespace jkj::dragonbox;

uint32_t i = 1283457033;
float f = *( float* )( &i );
    
auto v = to_decimal( f, policy::decimal_to_binary_rounding::nearest_to_even, policy::binary_to_decimal_rounding::do_not_care, policy::trailing_zero::report, policy::sign::return_sign, policy::cache::full );

Looking for advice - how to convert float to double, keeping to_string(double) same as to_string(float)

Hi,
Can you give some advice, how to convert float (4 bytes) to double (8 bytes).

I am looking to convert float -> double in a way that to_string(double) to produce same string as original to_string(float).

Any advice would be really appreciated.

My current idea is to you dragonbox to convert float to chars, and then use some fast chars->double converter with round-trip guarantee.

Would be very nice if you will tell me that my idea is stupid, and describe a way that is much better.

Thanks in advance!

`to_decimal` regressed since last November

double x = 1.234;
auto v = jkj::dragonbox::to_decimal(x);
std::cout << v.significand << std::endl;

outputs 1234000000000000 instead of 1234. I am not sure if this is intended or not, but it is an example from readme, so either readme should be fixed or something is wrong with the code.

I noticed this when tried to update Dragonbox from 33a9e02 to the current master in the stress of SerenityOS's Ryu implementation.

pkg-config file

Hello,

Would it be possible to offer a .pc (pkg-config) file that'd be installed with the build system, for use with non-CMake projects?

Thank you!

Add Workflows to Ensure Portability?

Currently, there seems to be no cross-compilation to ensure support for ARM, as well as OSes like Windows and macOS. It may be wise to add tests for ARM 32-bit and 64-bit, as well as a big-endian target (PowerPC64 could be a good example). Windows and macOS support are native to workflows, so it's a simple matter of adding configurations. There's a few available options to ensure quick cross-compilation for different:

  • Dockcross Supports many different ARM targets, but lacks support for PowerPC64 and requires Docker.
  • zig cc A drop-in replacement for GCC/Clang, but supports extensive cross-compilation, including all our desired targets.

Use Dockcross would be a drop-in replacement, while zig cc would only require a toolchain file for the configurations.

These would make sure the intrinsics, such as those in #11, are supported properly. If desired, I'd gladly submit a PR with the additional workflows. Would any other targets, such as MIPS64, or RISC-V be desired?

MSVC: warning C4702: unreachable code

The following error arises when compiling with MSVC (17.5.0 Preview 2.0) with /W4:

...\dragonbox-src\include\dragonbox\dragonbox.h(662): warning C4702: unreachable code

The line in for the warning is:

constexpr auto divisor = compute_power<N>(UInt(10));

However, that isn't the real problem. The warning is actually triggered by the whole second if constexpr block at:

if constexpr (std::is_same_v<UInt, std::uint64_t> && N == 3 &&

Changing that second if constexpr to else if constexpr resolves the warning.

x86(Win32) on MSVC 2019 reporting roundtrip errors

I used https://github.com/jk-jeon/dtoa-benchmark to build a win32 exe straight forward from the vs 2019 sln file, and find out dragonbox is corrupted on win32 at the moment. I also tried overwriting dragonbox.h in benchmark project with the latest file from this repository, still no help.

To reproduce:

  • clone dtoa-benchmark, and open it in vs 2019
  • fix win32 include path (a pull request is there in the benchmark repo)
  • comment out milo_only_grisu2.cpp REGISTER_TEST(...) line (it's very slow or maybe dead-loop?)
  • build & run

Improve paper

I recently implemented the main algorithm in Factor with the paper as my main source.

My two main hurdles were these:

  • There is a typo Algorithm 5.6, steps 6โ€“8: where there is 10^(k_0), it should be 10^(-k_0). I could only find this by looking at the reference implementation.
  • In Algorithm 5.4, step 5, or in the text, very little is said about how to break the tie. The code is a little obfuscated, the significand is 10sฬƒ + t, not f_c, as I first thought. Also helpful would be the information that you choose y^(rd) if the significand is odd, for the nearest-to-even case.

Another, smaller grievance is in section 5.2.3:

  • The information that d_1, d_2 are always zero is hidden away very well.
  • The upper bound for e in the integer check of x^(i) and z^(i) is a constant. It evaluates to 3 in the case of binary32 and binary64.

P.S.: In section 5.1.7 for completeness, you could mention that you compute r as r = z^(i) - 10^(ฮบ+1)*s. This might seem like baby steps, but I had to extract the obvious from another implementation.

Do you have an algorithm to deal with ratio and percentage for integers?

I find situations like percentages very common, but using floating pointers to print them out would be very costly.

Do you have an algorithm to deal with that?

One without precision:

For example, giving input (3,10)
Output:
30%
or
0.3
For example, giving input (3,9)
Output:
33.(3)%
or
0.(3)
For example, giving input (2,14)
Output:
14.(285714)%
or
0.(142857)

Also, with precision version (which is very common in statistics, like keeping two digits decimals):

For example, giving input (2,14), precision 2
Output:
14.26%
or
0.14

Change the way policy parameters are categorized

Currently a policy parameter belongs to a policy kind if its type is deriving from the specified base type.
At this point it seems checking for the existence of a specific member type sounds like a better idea in general.

How to get a fixed string value format

For example,

constexpr int buffer_length = 1 + // for '\0'
                  jkj::dragonbox::max_output_string_length<jkj::dragonbox::ieee754_binary64>;
char buffer[buffer_length];
double d{980528.1478392933};
char* end_ptr = jkj::dragonbox::to_chars(price, buffer);
std::cout << buffer << std::endl;

I got "9.805281478392933E5", but I want "980528.1478392933";

When the mantissa of double is 16 digits, seems to_decimal() not 100% correct

__int64_t intpart = getInteger();
__int64_t fractpart = getInteger();
double dbl = fractpart + 0.000000000000000000000005;
dbl =  dbl / std::pow(10, countDigit(fractpart));  // countDigit return 5 when fractpart is 12345
dbl += intpart;

I want to make sure my double binary representation like 123.12345000000000000000000000005. So when I run jkj::dragonbox::to_decimal(dbl), I want to make sure I get 123.12345 (of course by (significand, exponent, and is_negative)), so I can verify. I used above code to generate 10000 values. However when significand/mantissa is 16 digits, quite a lot of the decimal are not accurate although I tried most policies.

if significand/mantissa < 16, so far my test looks ok.

Paper/implementation mismatch?

In practice, computing the parity of x(i) is still a heavy
operation compared to others. Hence, it can be beneficial to
delay it until it is absolutely necessary. In Algorithm 5.2, we
need to compute the parity of x(i) when r and ฮด(i) turn out
to be the same. However, in fact if either the interval I does
not include the left endpoint or x is not an integer, we always
conclude I โˆฉ 10^(โˆ’k0+1)Z is empty regardless of the parity of
x(i). Therefore, if the given rounding mode indicates that the
left endpoint is not in I, we can skip computing the parity of
x(i).

(section 5.1.6, page 21)

If I understand this correctly, the condition of the left endpoint being excluded or x not being an integer is enough to say that the intersection in question is empty, and the parity doesn't need to be checked.

However, the algorithm implementation seems to disagree:

                    if (!interval_type.include_left_endpoint() ||
                        exponent < case_fc_pm_half_lower_threshold ||
                        exponent > divisibility_check_by_5_threshold) {
                        // If the left endpoint is not included, the condition for
                        // success is z^(f) < delta^(f) (odd parity).
                        // Otherwise, the inequalities on exponent ensure that
                        // x is not an integer, so if z^(f) >= delta^(f) (even parity), we in fact
                        // have strict inequality.
                        if (!compute_mul_parity(two_fl, cache, beta).parity) {
                            goto small_divisor_case_label;
                        }
                    }
                    else {
                        auto const [xi_parity, x_is_integer] =
                            compute_mul_parity(two_fl, cache, beta);
                        if (!xi_parity && !x_is_integer) {
                            goto small_divisor_case_label;
                        }
                    }

In fact, the comments explicitly state that checking parity is required. From quick testing it seems that the code is right here.

Fastest way of trimming trailing zeros

Currently we use the method proposed in the paper by Granlund-Montgomery: using modular inverse and std::rotr.

However, there are at least two competitors in this league. Assuming $q$-bit integers, and let $d$ be the divisor, then:

  • Lemire's method: multiply $\left\lceil\frac{2^{q}}{d}\right\rceil$, and compare the lower $q$-bits with $\left\lceil\frac{2^{q}}{d}\right\rceil$. Regardless of the comparison result, the upper $q$-bits is the quotient of the division. Thus it needs $2q$-bits widening multiplication, upper $q$-bits for the quotient and lower $q$-bits for the divisibility test, but there is no shift, and only one constant needs to be loaded into the register.
  • A different generalization of Granlund-Montgomery for non-odd divisors: Applying Theorem 4.2 from our paper, it is possible to generalize Granlund-Montgomery style modular inverse method into non-odd divisors. In this new method, bit-rotation is replaced by regular bit-shift, and the shift needs to be done only when the comparison succeeds so the input is determined to be a multiple of $d$, while Granlund-Montgomery unconditionally does the bit-rotation. One should note that this method does not work for all $q$-bit unsigned integers, but the range of valid inputs is enough for our application.

For future reference for myself, I write the code for the last method here:

int remove_trailing_zeros(std::uint32_t& n) noexcept {
    int s = 0;
    while (true) {
        auto const nm_mod = std::uint32_t(n * UINT32_C(42949673));
        if (nm_mod < UINT32_C(42949673)) {
            s += 2;
            n = std::uint32_t(nm_mod >> 2);
        }
        else {
            break;
        }
    }
    auto const nm_mod = std::uint32_t(n * UINT32_C(1288490189));
    if (nm_mod < UINT32_C(429496731)) {
        s += 1;
        n = std::uint32_t(nm_mod >> 1);
    }
    return s;
}

int remove_trailing_zeros(std::uint64_t& n) noexcept {
    int s = 0;
    while (true) {
        auto const nm_mod = std::uint64_t(n * UINT64_C(14941862699704736809));
        if (nm_mod < UINT64_C(184467440737095517)) {
            s += 2;
            n = std::uint64_t(nm_mod >> 2);
        }
        else {
            break;
        }
    }
    auto const nm_mod = std::uint64_t(n * UINT64_C(5534023222112865485));
    if (nm_mod < UINT64_C(1844674407370955163)) {
        s += 1;
        n = std::uint64_t(nm_mod >> 1);
    }
    return s;
}

According to a computation, the $32$-bit version is valid up to n = 1'073'741'899, and the $64$-bit version is valid up to n = 4'611'686'018'427'387'999. In Dragonbox, we need these routines for n's up to 16'777'215 for the $32$-bit version, and n's up to 9'007'199'254'740'991 for the $64$-bit version, so the input range is more than enough.

Invalid template parameters when using toward_minus_infinity and toward_plus_infinity

Trying to compile the following code, I get an error on line 2058, where the second parameter passed to divide_by_pow10 is significand_bits + kappa + 2 instead of carrier_uint or something similar. I also assume the 3rd parameter isn't the correct value.

#include "dragonbox.h"

int main( int argc, char** argv ) {
    
    float x = 1.234f;   
    auto v = jkj::dragonbox::to_decimal( x, jkj::dragonbox::policy::decimal_to_binary_rounding::toward_plus_infinity );
    return 0;
}

Compiling with MSVC: cl -nologo main.cpp -Fe:dragonbox.exe -Zi -Od -std:c++17

Better support for ARM and other architectures

Thanks to Tim Kientzle (@tbkka), it turned out that the current code is not working correctly with ARM processors. I believe this is due to errors in fallback implementations of intrinsics. Also the current #define switches are not able to correctly detect availabilities of intrinsics for ARM processors, resulting in suboptimal code generation.

See: swiftlang/swift#35299.

Wrong constant?

In line 832 of dragonbox.h there is a right shift of 7 bits, whereas in the paper it is a right shift of 71 bits (section 5.1.7). It seems that 71 is indeed correct.

constexpr

I recently contributed to https://github.com/fastfloat/fast_float to make the from_chars implementation there constexpr from C++20.

I think making to_chars constexpr is equally possible. Would efforts to do this be welcome here?

I see that to_chars is currently defined in a .cpp file, which could make this challenging without making the library header-only.

enhancement proposal: pimp up the readme, a little, for dummies ...

[ edit ] solved, the below discussion - besides lengthy - helped me
to get the door open, thus if you don't find better solution take
the time and read. [ /edit ]

I followed the readme, downloading and building had meaningful
responses, but I'm stuck to get the door open.
Would like to compile and run the test programs provided in the
readme, but don't know where to put and how to compile.
I'm not an experienced C++ coder, neither can tell the difference
between Cmake and make.

TIA for any help :-)

nvcc (Version 201402L) throws "error: namespace "std" has no member "is_same_v"

Hi,
I am trying to test dragonbox with the nvcc compiler (Version 201402L).
But it throws;

  1. dragonbox/dragonbox_to_chars.h(98): error: namespace "std" has no member "is_same_v" std::is_same_v<FloatFormat, ieee754_binary32>
  2. dragonbox/dragonbox_to_chars.h(98): error: type name is not allowed std::is_same_v<FloatFormat, ieee754_binary32>
  3. dragonbox/dragonbox_to_chars.h(98): error: expected a ";" std::is_same_v<FloatFormat, ieee754_binary32>

I only include dragonbox_to_chars.h in my test code (I have to write millions of float numbers to a csv file with high performance. std::to_string and fprintf are far to slow.).

In my understanding and looking into the dragonbox code it should work with this nvcc version.

Where is the knot in the head?

Thank you!

Simpler "reference implementation"

The current implementation has grown too much to a point where it is nearly impossible to understand for people with not so much exposure to template black magics and other C++-specific nonsenses. The complication of the implementation can act as a significant barrier for people wanting to port the algorithm into languages other than C++, or even into C++, even though the algorithm itself is not that overcomplicated.

Those templates and other nonsenses are done for reasons, to make this library general and fast enough, but at the same time since this is the reference implementation, it should serve the role as such, i.e., to provide guidance to other people wanting to write their own implementations. Hence, I think it makes sense to provide the core algorithm written in simpler, non-esoteric C++ (or maybe even C99?) that everybody can understand.

To be clear, I do not intend to replace the current implementation with a simpler one, which I consider impossible without losing many utilities and performance. As pointed out earlier, those template hackeries are done for reasons. Instead, what I want to have is a standalone implementation of the algorithm that only serves the expository role, which does not need to be optimized-to-hell, rather just enough to demonstrate the algorithm's potential. There already is an implementation which can serve such a role, done by Alexander Bolz, but (1) there is no implementation for IEEE-754 binary32, (2) his implementation is quite outdated at this moment as the algorithm has been improved quite a lot since then, and (3) I think it's nicer to have one in this official repository for the reference implementation, kept in sync with whatever further improvements we bring in the future.

I may work on this someday if no one helps me, but PR is very welcome if anyone is interested!

The output is of the shortest length is a question mark.

DragonDecimal.zip

I generated decimal normal number by concatenate to integers together to this effect:

dbl = operand_fractpart / std::pow(10, n); // operand_fractpart is an integer
dbl += operand_intpart; // operand_intpart is also an integer

Then passes a double to DragonDecimal.cpp as attached. I passed 10000 doubles and toStringRaw() output is also attached.
fp_by_dragonbox_decimal_raw8.txt

My code makes sure operand_fractpart is only 8 digits. However we can see -7.018380128700001E2. By right it should be -7.0183801287E2?

However if I tried to verify whether -701.83801287 is a normal number by:

double d = -701.83801287;
memset(buffer, 0x0, sizeof(buffer));
jkj::dragonbox::to_chars(d, buffer);
std::cout << buffer << std::endl;

Then it's ok. I feel a literal double may be different from a dynamic memory double?

Every my 10000 double, there are always a couple of non-shortest values.

C++11?

I need a port of this to C++11

Finish moving the project to CMake

Related: #3 (comment)

  • Describe the purpose of benchmark
  • Make test .cpp sources independent to register each for its own test
  • Remove VS specific files (VS supports CMake)
  • Group various non-code assets in one folder and/or remove from version control (would be nice, root is messy)

The multipllier $10^{\kappa}$ does not need to be a power of 10

The only place where it is a power of 10 is used is that powers of 10 are stored in the precomputed cache. By allowing more general integers other than $10^{\kappa}$, we may further reduce the chance of having to investigate the fractional part, thus boosting performance. It may also opens up a possibility of eliminating the needs for treating the shorter interval case specially.

Is there a decimal_fp<Float> -> Float conversion?

Unless I am missing something there appears to be no reverse conversion in this library.

I am trying to convert Ryu based code that operates on moral equivalent of decimal_fp (doing more than just printing) to Dragonbox. Lack of reverse conversion is a big stumbling block because I would need to essentially carry around big chunks of Ryu data and code to do it.

As an aside it also looks like to_chars only does exponential notation. Lack of fixed format would be another stumbling block.

Would be great if decimal_fp were exposed as a first class thing with conversion and formatting functions. ๐Ÿ˜ƒ

Optimize decimal_length_minus_1

As discussed in dtolnay/dragonbox#1, the algorithm for calculating the decimal length is rather inefficient, since the number of digits can be computed with minimal branching using a shr, add, and a table lookup instruction as described here.

The current implementation is:

JKJ_FORCEINLINE static constexpr std::uint32_t decimal_length_minus_1(std::uint32_t const v) {
    assert(v < 1000000000);
    if (v >= 100000000) { return 8; }
    if (v >= 10000000) { return 7; }
    if (v >= 1000000) { return 6; }
    if (v >= 100000) { return 5; }
    if (v >= 10000) { return 4; }
    if (v >= 1000) { return 3; }
    if (v >= 100) { return 2; }
    if (v >= 10) { return 1; }
    return 0;
}

Which optimizes quite poorly (for example, on x86):

decimal_length_minus_1(unsigned int):
        mov     eax, 8
        cmp     edi, 99999999
        ja      .L1
        mov     eax, 7
        cmp     edi, 9999999
        ja      .L1
        mov     eax, 6
        cmp     edi, 999999
        ja      .L1
        mov     eax, 5
        cmp     edi, 99999
        ja      .L1
        mov     eax, 4
        cmp     edi, 9999
        ja      .L1
        mov     eax, 3
        cmp     edi, 999
        ja      .L1
        mov     eax, 2
        cmp     edi, 99
        ja      .L1
        xor     eax, eax
        cmp     edi, 9
        seta    al
.L1:
        ret

Using a built-in ctlz function, we can make this much more efficient:

#ifdef _MSC_VER
#include <intrin.h>
#endif

JKJ_FORCEINLINE static std::uint32_t ctlz(std::uint32_t v) {
    // Optimize for GCC/Clang when possible.
#ifdef __has_builtin
  #if __has_builtin(__builtin_clz)
    return __builtin_clz(v);
  #endif
#endif

    // Optimize for MSVC
#ifdef _MSC_VER
  #if defined(_M_X64) || defined(_M_ARM64)
    unsigned long leading_zero = 0;
    _BitScanReverse(&leading_zero, v);
    return static_cast<std::uint32_t>(31 - leading_zero);
  #endif
#endif

    // Fallback algorithm
    int count = 0;
    if (v & std::uint32_t(0xffff0000)) {
        v >>= 16;
        count += 16;
    }
    if (v & std::uint32_t(0xff00)) {
        v >>= 8;
        count += 8;
    }
    if (v & std::uint32_t(0xf0)) {
        v >>= 4;
        count += 4;
    }
    if (v & std::uint32_t(0xc)) {
        v >>= 2;
        count += 2;
    }
    if (v & std::uint32_t(0x2)) {
        v >>= 1;
        count += 1;
    }
    return 31 - count;
}

JKJ_FORCEINLINE static std::uint32_t fast_log2(std::uint32_t const v) {
    return 31 - ctlz(v | 1);
}

JKJ_FORCEINLINE static std::uint32_t decimal_length_minus_1(std::uint32_t const v) {
    static constexpr std::uint32_t table[] = {9, 99, 999, 9999, 99999, 999999, 9999999, 99999999, 999999999};
    std::uint32_t y = (9 * fast_log2(v)) >> 5;
    y += v > table[y];
    return y;
}

Most of the bloat is just to ensure we have a portable way to count leading zeros. And this optimizes quite nicely with the built-in optimizations:

decimal_length_minus_1(unsigned int):
        mov     eax, edi
        mov     edx, 31
        or      eax, 1
        bsr     eax, eax
        xor     eax, 31
        sub     edx, eax
        lea     eax, [rdx+rdx*8]
        shr     eax, 5
        mov     edx, eax
        cmp     DWORD PTR decimal_length_minus_1(unsigned int)::table[0+rdx*4], edi
        adc     eax, 0
        ret

For MSVC on x86_64, we have:

unsigned int decimal_length_minus_1(unsigned int) PROC                    ; decimal_length_minus_1, COMDAT
        xor     eax, eax
        mov     edx, ecx
        or      edx, 1
        bsr     r8d, edx
        lea     r9d, DWORD PTR [r8+r8*8]
        shr     r9d, 5
        lea     r8, OFFSET FLAT:unsigned int const * const `unsigned int decimal_length_minus_1(unsigned int)'::`2'::table
        cmp     ecx, DWORD PTR [r8+r9*4]
        seta    al
        add     eax, r9d
        ret     0
unsigned int decimal_length_minus_1(unsigned int) ENDP  

And for MSVC on ARM64, we have:

|unsigned int decimal_length_minus_1(unsigned int)| PROC              ; decimal_length_minus_1
        orr         w8,w0,#1
        clz         w9,w8
        mov         w10,#0x1F
        sub         w10,w10,w9
        add         w8,w10,w10,lsl #3
        adrp        x9,|unsigned int const * const `unsigned int decimal_length_minus_1(unsigned int)'::`2'::table|
        add         x9,x9,|unsigned int const * const `unsigned int decimal_length_minus_1(unsigned int)'::`2'::table|
        lsr         w11,w8,#5
        ldr         w8,[x9,w11 uxtw #2]
        cmp         w0,w8
        csethi      w10
        add         w0,w10,w11
        ret

        ENDP  ; |unsigned int decimal_length_minus_1(unsigned int)|, decimal_length_minus_1

Since the fallback ctlz algorithm isn't very good, we could do a simple optimization for MSVC and GCC/Clang, and for another other compiler keep the old algorithm.

Typos in the paper

It seems that there should be no "= \varnothing" at the beginning of the second column on page 10.
typo1
And no "-s" in "ingredients", likewise no "-o" in "computationo".
typo2

As we are already talking about the article, how is this fact at the very end of the first column on page 9 being used later in the proof?
image

Consider adding a section on README elaborating on what roundtrip precisely means

Hi @jk-jeon, and thank you for your amazing work on this!
This isn't exactly an issue, but rather a question or a request for clarification of the documentation.
The documentation says the algorithm is round-trip.
To check this my understand of this, I have written a simple loop whose index "i" iterates all uint64_t values and:

  • obtains a double "orig" by bit_cast-ing the index
  • creates decimal "dec" from "orig" via dragonbox;
  • obtains another double called "back" via decimal.significand * pow(10.0, decimal.exponent), and then inverting it if negative;
  • compares the two doubles, and prints an error message if they're different;
  • the error message also includes a bit_cast-ed uint64_t from the "back" double, called "back_i" in the output below.

Here's a sample of its output; formatting is done with an old version of fmtlib (5.3.0):

i=4716101740380815687 orig=1.98801e+07 dec=(1988012400000122, -8, +) back=1.98801e+07 back_i=4716101740380815688
i=4716101740380815688 orig=1.98801e+07 dec=(19880124000001222, -9, +) back=1.98801e+07 back_i=4716101740380815689
i=4716101740380815699 orig=1.98801e+07 dec=(19880124000001263, -9, +) back=1.98801e+07 back_i=4716101740380815700
i=4716101740380815700 orig=1.98801e+07 dec=(19880124000001267, -9, +) back=1.98801e+07 back_i=4716101740380815701
i=4716101740380815703 orig=1.98801e+07 dec=(19880124000001278, -9, +) back=1.98801e+07 back_i=4716101740380815704
i=4716101740380815713 orig=1.98801e+07 dec=(19880124000001315, -9, +) back=1.98801e+07 back_i=4716101740380815714

I haven't verified this programatically but I notice that the back integer representation is one off the original integer from which the double was created. Note that the exact code computing "back" from "dec" is:

double back = static_cast<double>(dec.significand) * pow(10.0, static_cast<double>(decimal.exponent));

My questions are:

  • what am I doing wrong, if anything, in converting from the decimal to a double? I have a feeling that the answer is "everything"...
  • would the rounding policy have anything to do with off-by-one observation above?

Thank you kindly in advance for your time.

JSON compliant to_chars?

I'd like to use dragonbox for serializing floats in Glaze. What I'm particularly looking for is a JSON compliant to_chars implementation that has the shortest string result. It seems like dragonbox's to_chars will print out ...E0 when it is not necessary.

The other requirement for JSON compliance is that infinity and nan should be printed as "null".

Could options to support these features be added to dragonbox or should I fork the code and edit it myself?

I'd also like the to_chars to be header only, but I don't mind doing this conversion myself and keeping it up to date with dragonbox.

Thanks for all the hard work you've put into this great library!

More elegant removal of trailing zeros

When spitting out decimal digits via jeaiii-style integer formatting, it is probably possible to do integer checks to figure out that all the remaining digits are zero. Then maybe we can spare yet more multiplications therefore get a better performance.

dragonbox_to_chars.h should have a `static const int` with minimum buffer size guaranteed to be enough

essentially, the same as DtoaMinBufferLength from https://github.com/abolz/Drachennest/blob/master/src/dragonbox.h (although the value given there seems a bit large but i haven't looked into it)

this would be much more reliable and usable than relying on what the README says:

char buffer[31];   // Should be long enough

that way, users can write:

char buffer[dragonboxBufferLen];

and it'll guarantee no buffer overflow, etc

Update the paper reflecting the recent change in integer checks

Perhaps it would be possible to apply the rounding trick that Schubfach uses to eliminate some of those ugly integer checks, which will greatly improve the overall performance.

However, it requires a major overhaul of the entire algorithm so it will take time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.