Giter VIP home page Giter VIP logo

Comments (7)

petrmanek avatar petrmanek commented on May 18, 2024

Ehm, anyone here?

from scnlib.

amyspark avatar amyspark commented on May 18, 2024

This is indeed the case. Due to the usage of strtoX, scn implicitly assumes the locale of the program is set to C.
A minimum working example follows for scan_localized, which is supposed to depend only on the passed locale.

#include <locale>
#include <iostream>
#include <scn/scn.h>

int main(int argc, char **argv)
{
    int a, b;
    float c, d;
    auto result = scn::scan("test 1 2 3.4 4,5", "test {} {} {} {}", a, b, c, d);
    std::cout << (result.error().code()) << " " << a << " " << b << " " << c << " " << d << std::endl;

    auto loc = std::locale("es_ES.UTF-8");
    float aa, bb;
    result = scn::scan_localized(loc, scn::string_view("1.1 2,2"), "{} {l}", aa, bb);
    std::cout << (result.error().code()) << std::use_facet<std::numpunct<char>>(loc).decimal_point() << " " << aa << " " << bb;

    exit(0);
}

The output is:

❯ env -i LANG="es_ES.UTF-8" LC_ALL="es_ES.UTF-8" ./build/test
0 1 2 3.4 4
2, 1.1 0  

It should be:

0 1 2 3 4.5
2, 1 0

As an aside remark, the initialization of scan_localized is broken.

{std::addressof(loc)});

The initialization needs to pass the pointer to locale_ref first, locale_ref(std::addressof(loc)).

from scnlib.

petrmanek avatar petrmanek commented on May 18, 2024

Agreed.

Clearly the current implementation would benefit from an extension. One possible way to go may be to make use of an existing parser that is both well-tested and benchmarked, e.g. fast_double_parser or similar. However, afaik this particular project is not locale-agnostic as it focuses only on IEEE 754. Would you have any suggestions/alternatives?

Also the scan_localized initialization bug would merit a separate issue.

from scnlib.

petrmanek avatar petrmanek commented on May 18, 2024

Pinging this issue after a couple of months. Any word from the maintainers? For instance, @eliaskosunen @xvitaly

from scnlib.

eliaskosunen avatar eliaskosunen commented on May 18, 2024

Somewhat fixed in b6db6ba by working around the issue by setting the global C locale to C before strtoX, and after the call resetting it back to what it was before.

Sorry for my inattentiveness to this library and issue as of late.

Clearly the current implementation would benefit from an extension. One possible way to go may be to make use of an existing parser that is both well-tested and benchmarked, e.g. fast_double_parser or similar. However, afaik this particular project is not locale-agnostic as it focuses only on IEEE 754. Would you have any suggestions/alternatives?

Parsing floats correctly is really hard. I'm not really keen on the idea of adding a hard external dependency, though. The optimal solution would be to use std::from_chars, but support for that is rather lacking: per https://en.cppreference.com/w/cpp/compiler_support > C++17 library features > "Elementary string conversions" (P0067), only MSVC 16.4 and libstdc++ 11 has implemented <charconv> for floating point.

If we were to add an external dependency, I think it should be optional, and usage of it should be enabled through a compiler/CMake flag. https://github.com/lemire/fast_float could be a viable option, as could https://github.com/jk-jeon/fp once it matures. fast_double_parser linked above rejects valid inputs (nan and inf), so it may not be suitable.

from scnlib.

petrmanek avatar petrmanek commented on May 18, 2024

Somewhat fixed in b6db6ba by working around the issue by setting the global C locale to C before strtoX, and after the call resetting it back to what it was before.

I can confirm this works in my environment (with a non-standard locale). Enforcing locale externally is no longer required to parse conventional C-formatted floating-points. Cheers! 🎉

If we were to add an external dependency, I think it should be optional, and usage of it should be enabled through a compiler/CMake flag. https://github.com/lemire/fast_float could be a viable option, as could https://github.com/jk-jeon/fp once it matures. fast_double_parser linked above rejects valid inputs (nan and inf), so it may not be suitable.

That is entirely up to you as a maintainer. Reading the documentation of std::from_chars leads me to believe that this would be my long-term preference as well if I were in your shoes. Given the current state of its STL support though (or more precisely, the lack of it), you may still want to consider what can be done if it is unavailable. In that regard, both projects that you referenced seem superior to fast_double_parser in more ways than one. I particularly enjoy their design decision to be used as drop-in alternatives to the standard, minimizing any possible switching costs in the future.

Concerning the level of integration, I can only agree with you again. Hard dependencies feel annoying and restrictive, so adding a flag may give a little bit more freedom to users. Ideally, it would be nice if the default behavior of such a flag could be determined by the compiler's <charconv> support (perhaps by a CMake test). This way, users will always be free to override the flag manually (acknowledging the consequences), and standard floating-point parsing behavior will be guaranteed at all times.

from scnlib.

eliaskosunen avatar eliaskosunen commented on May 18, 2024

Closing, as the issue is now fixed.

from scnlib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.