Giter VIP home page Giter VIP logo

Comments (11)

ViralBShah avatar ViralBShah commented on April 28, 2024

Seems like this can be in 2.0, unless it is easy enough to do. Stefan?

from julia.

StefanKarpinski avatar StefanKarpinski commented on April 28, 2024

Yeah, it turns out to be a pain in the ass to do from the C code (can't find the email thread, but it is).

Another possibly simpler way to handle this is to change the parser to disallow \x80 through \xff in bare string literals but to allow them in prefixed string literals, passing them through to implementing macro to decide whether to allow them or not. Throwing a syntax error from a macro is no problem, it's throwing a syntax error from C code called directly by the parser that's problematic. @JeffBezanson, can you do that?

from julia.

JeffBezanson avatar JeffBezanson commented on April 28, 2024

Actually it makes sense anyway for the parser to give a syntax error on invalid utf-8 sequences in a string or anywhere else. The parser can also check in cases where it does unescaping, and the julia string macros can check when they do unescaping. In case of an error the macro should return an error expression, Expr(:error,{"msg"}) instead of throwing, to allow history to work properly.

Also this seems strange:

julia> "\\\""
"\\\""

julia> S"\\\""
"\""

Am I doing something wrong in the parser?

from julia.

JeffBezanson avatar JeffBezanson commented on April 28, 2024

Also I see utf32.j is not in use. Can we get it in shape?

from julia.

StefanKarpinski avatar StefanKarpinski commented on April 28, 2024

I could work on utf32.j and latin1.j or we could mothball them for the release. The main issue is that currently we don't actually have any support for reading files with different encodings. These aren't really useful without that.

There's definitely something funny going on with escape handling that's different for bare vs. prefixed or otherwise macro-handled strings. See issue #100. Pretty sure it's the same problem. I'm looking into it, but I'm not quite sure what's going wrong yet.

from julia.

JeffBezanson avatar JeffBezanson commented on April 28, 2024

Ah, I was wrong, a macro can throw an error and it is automatically handled.

from julia.

StefanKarpinski avatar StefanKarpinski commented on April 28, 2024

How does 6614948 not address all of this issue? Seems fully addressed to me.

from julia.

JeffBezanson avatar JeffBezanson commented on April 28, 2024

Oh, I guess that's true since print_unescaped never generates invalid sequences. But we have this:

julia> "\x80"
syntax error: invalid utf-8 sequence

julia> "\x80$1"
"\u00801"

We have to do something else with \x and \000. In byte arrays, b"\x80", we clearly want \x to insert bytes. So to be consistent it should always insert bytes, and for strings this process is followed by a check to make sure all those bytes add up to valid utf-8.

from julia.

StefanKarpinski avatar StefanKarpinski commented on April 28, 2024

Ah, yeah. That is still an issue. I'll add a check after constructing a new string. If we just disallowed the escapes above \x7f altogether checking UTF-8 validity would be unnecessary since there'd be no way to even express an invalid string. That would have to match between the parser and the str_S form though — shouldn't allow it in one and not the other.

from julia.

JeffBezanson avatar JeffBezanson commented on April 28, 2024

Yes that would also be a sensible approach. It's kind of a toss up. I prefer to err on the side of allowing as much as possible. You can enter anything, but we call a validation routine. It's "trust but verify" :)
The validation is needed anyway since the source file itself could contain invalid utf-8 even with no escape sequences present.

Plus, after fixing this we get byte array literals for free.

from julia.

StefanKarpinski avatar StefanKarpinski commented on April 28, 2024

Closed by ad06687.

from julia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.