Giter VIP home page Giter VIP logo

Comments (12)

Flamefire avatar Flamefire commented on July 26, 2024 1

Adding check for E2BIG is not solution because we can have more symbols after Chinese one.

I meant to add a check here if any progress was made, i.e. in_left changed or output_count > 0 and only then continue. Otherwise either abort with an error that the implementation is broken or continue with the next input character like here

What I don't understand why after in_left = 0 and res = 0 the loop is not breaking: https://github.com/boostorg/locale/blob/c5314a857c5af029ced242820ef62deeec065b1d/src/boost/locale/encoding/iconv_converter.hpp#L78C17-L79C27

On first call it consumes 3 bytes and outputs one char. Next returns res = 0 - which basically means we need to exit since we are in unshifting state.

@artyom-beilis See #206 (comment) The first 2 were something different.

But yes a simple C code demonstrating the issue in Iconv and throwing at Apple support might be an idea. I wrote a simplified code: https://godbolt.org/z/oY6Th5Gh5 @daniboybye Can you try to compile and run that code on your system and see if that fails? If it does (which it should) please submit it to Apple support if you can.

from locale.

Flamefire avatar Flamefire commented on July 26, 2024

This looks similar to #196 as it happens with IConv on macOS 14. As I don't have access to a Mac I can't test it there to see what the issue might be. It looks like Apple has changed the iconv library on macOS 14 to be incompatible with the GNU version, see e.g. https://developer.apple.com/forums/thread/739533

I'm unable to find a documentation for libiconv on macOS 14 where the behavior is specified such that it can be compare with the other iconv implementations such as for GNU

So I'd need your help here:

  • Can you find a documentation of that library on your system?
  • Can you debug why it hangs? I.e. check
    const size_t res = (!is_unshifting) ? conv(&begin, &in_left, &out_ptr, &out_left) :
    conv(nullptr, nullptr, &out_ptr, &out_left);
    if(res != 0 && res != (size_t)(-1)) {
    if(how_ == stop)
    throw conversion_error();
    }
    const size_t output_count = (out_ptr - out_start) / sizeof(OutChar);
    sresult.append(tmp_buf, output_count);

On GNU libiconv for your input we have in_left=3 before, in_left=0 after the conv call and res=0 & output_count=7. What are the values on your system?

A workaround would be to not use iconv but ICU by disabling iconv which requires a change to the B2 build file: #207

from locale.

daniboybye avatar daniboybye commented on July 26, 2024

In my systems:
on first pass: in_left=1 before, in_left=0 after, res=0, output_count=1
on second pass: in_left=0 before, in_left=0 after, res=0, output_count=0
on third pass: in_left=3 before, in_left=3 after, res=-1, output_count=0
and we can't read more after that position

from locale.

Flamefire avatar Flamefire commented on July 26, 2024

Thanks for testing! I've noticed that I send you the code position for the develop branch, not 1.82. I assume you tested 1.82? I'll use that below but should be trivial to adapt if not.

Let me think loud:

on first pass: in_left=1 before, in_left=0 after, res=0, output_count=1

That looks wrong. If "实" was in UTF-8 it would be 3 bytes, not one. So I assume you saved the file in some locale specific encoding where that character can be represented in 1 Byte and hence is not UTF-8 and hence calling from_utf will not produce the expected result in any case. However it also shouldn't hang, so let's continue:

on second pass: in_left=0 before, in_left=0 after, res=0, output_count=0

in_left=0 should set state = unshifting and as after the next call res != -1 we'll set state = done which will exit the loop

on third pass: in_left=3 before, in_left=3 after, res=-1, output_count=0

First: How could a third pass happen if state=done? And why do we have in_left=3 now as we should have at the start? Maybe your "first pass" is on something else and the "third pass" is actually the first pass on your string? Can you verify this?

However the res=-1 should be checked here and depending on err should continue the loop if EILSEQ or EINVAL after incrementing begin, just continue for E2BIG or exit the loop for an unknown errno.

So what exactly did you mean by "causes a freeze" in the initial report or that last part?:

and we can't read more after that position

from locale.

daniboybye avatar daniboybye commented on July 26, 2024

My mistake, you're right. First pass is in_left=3 before, in_left=3 after, res=-1, output_count=0.
Err is 7(E2BIG) and state is normal.

from locale.

Flamefire avatar Flamefire commented on July 26, 2024

Ah that explains the infinite loop: It continues the loop without consuming or producing anything so it will do the exact same thing over and over again. I can add a check for that (so it returns an empty string instead) but the system iconv implementation still looks broken: E2BIG should be raised if "There is not sufficient room at *outbuf."

But we pass a valid pointer as outbuf and a size of 64 as out_left. How would that be not enough room to consume even a single byte? Can you verify that at https://github.com/boostorg/locale/blob/boost-1.82.0/src/boost/locale/util/iconv.hpp#L77 out points to result and outsize to "64"?

Maybe also compare the iconv manpage on your system (man "iconv(3)" on a shell) to the above linked to see if that has any differences which might provide hints.

If not I can only document that "IConv on macOS 14+ is broken and should be disabled" which isn't a great solution especially as we cannot detect this easily at buildtime as the interfaces are all there.

CC @artyom-beilis if he has any ideas left.

from locale.

daniboybye avatar daniboybye commented on July 26, 2024

Adding check for E2BIG is not solution because we can have more symbols after Chinese one.

from locale.

artyom-beilis avatar artyom-beilis commented on July 26, 2024

I suggest lets make trivial C example that reporduces the problem.

In my systems: on first pass: in_left=1 before, in_left=0 after, res=0, output_count=1 on second pass: in_left=0 before, in_left=0 after, res=0, output_count=0 on third pass: in_left=3 before, in_left=3 after, res=-1, output_count=0 and we can't read more after that position

What I don't understand why after in_left = 0 and res = 0 the loop is not breaking: https://github.com/boostorg/locale/blob/c5314a857c5af029ced242820ef62deeec065b1d/src/boost/locale/encoding/iconv_converter.hpp#L78C17-L79C27

On first call it consumes 3 bytes and outputs one char. Next returns res = 0 - which basically means we need to exit since we are in unshifting state.

I really don't understand how we get there

from locale.

daniboybye avatar daniboybye commented on July 26, 2024

I confirm that this example demonstrates the problem and will add it to my report to Apple.

from locale.

Flamefire avatar Flamefire commented on July 26, 2024

Thank you. Please let us know if you get any new information.

I confirm that this example demonstrates the problem and will add it to my report to Apple.

Can you post the output or how you are sure it is exactly this problem? I.e. no input consumed (in_left == 3 && out_left == 64), res=-1 with errno=E2BIG?

It occured to me that on MacOS it might be using the FreeBSD version. Could you run https://godbolt.org/z/98aah1n51 (and optionally https://godbolt.org/z/14eaPefsW for the related issue) on your system and post the output please? It also contains more printed

from locale.

daniboybye avatar daniboybye commented on July 26, 2024

My output for https://godbolt.org/z/98aah1n51

E2BIG=7
EILSEQ=92
EOPNOTSUPP=102
EINVAL=22

Original: \E5\AE\9E
in_left: 3
res: 4294967295
errno: 7
in_left: 3
out_left: 64

line:33 Test FAILED: res == 0u
line:34 Test FAILED: errno == 0
line:35 Test FAILED: in_left == 0u
line:36 Test FAILED: out_left == 64u - 7u
res: 0
errno: 7
in_left: 3
out_left: 64

line:45 Test FAILED: errno == 0
line:46 Test FAILED: in_left == 0u
line:47 Test FAILED: out_left == 64u - 8u


https://godbolt.org/z/14eaPefsW

E2BIG=7
EILSEQ=92
EOPNOTSUPP=102
EINVAL=22

Original: \E2\80\A6\E2\80\A6
in_left: 6
res: 0
errno: 0
in_left: 0
out_left: 62
\3F\3F
line:36 Test FAILED: out_left == 64u - 4u
res: 0
errno: 0
in_left: 0
out_left: 62
\3F\3F
line:47 Test FAILED: out_left == 64u - 4u

U+0085: \C2\85\C2\85
in_left: 4
res: 4294967295
errno: 92
in_left: 4
out_left: 64

line:33 Test FAILED: res == 0u
line:34 Test FAILED: errno == 0
line:35 Test FAILED: in_left == 0u
line:36 Test FAILED: out_left == 64u - 4u

res: 0
errno: 92
in_left: 4
out_left: 64

line:45 Test FAILED: errno == 0
/line:46 Test FAILED: in_left == 0u
line:47 Test FAILED: out_left == 64u - 4u

U+2026: \E2\80\A6\E2\80\A6
in_left: 6
res: 0
errno: 0
in_left: 0
out_left: 62
\3F\3F
line:36 Test FAILED: out_left == 64u - 4u
res: 0
errno: 0
in_left: 0
out_left: 62
\3F\3F
line:47 Test FAILED: out_left == 64u - 4u

from locale.

Flamefire avatar Flamefire commented on July 26, 2024

It looks like this is indeed an issue with macOS 14/iOS 17 and fixed in 14.2/17.2 respectively as per d99kris/nmail#150 (comment)

#218 will close this issue by throwing and exception instead of freezing when the issue is detected

from locale.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.