Giter VIP home page Giter VIP logo

Comments (15)

mooreryan avatar mooreryan commented on August 11, 2024 1

I'm not sure what you mean by the \f patch. The pull request you link doesn't change behavior with the form feed character, as it is already properly escaped in the code.

Edit: oh, I think you mean this patch.

from stdlib.

lpil avatar lpil commented on August 11, 2024

Thank you

from stdlib.

Michael-Mark-Edu avatar Michael-Mark-Edu commented on August 11, 2024

Here are all the escape codes that are relevant to this:
gleam_issue600

(code 127 is in the "invisible" category, codes 128+ are UTF-8 and shouldn't have special functionality)

from stdlib.

Michael-Mark-Edu avatar Michael-Mark-Edu commented on August 11, 2024

This also seems to be an Erlang-specific issue, since Javascript just turns everything into unicode
image

from stdlib.

mooreryan avatar mooreryan commented on August 11, 2024

Shouldn't form feed also be green in your list? It seems to escape fine:

import gleam/io
import gleam/list
import gleam/string

fn cp(n: Int) -> UtfCodepoint {
  let assert Ok(cp) = string.utf_codepoint(n)
  cp
}

pub fn main() {
  [34, 92, 13, 10, 9, 12]
  |> list.map(cp)
  |> string.from_utf_codepoints
  |> string.inspect
  |> io.println
}

// prints: "\"\\\r\n\t\f"

See the erlang code here:

inspect_maybe_utf8_string(Binary, Acc) ->

from stdlib.

mooreryan avatar mooreryan commented on August 11, 2024

I suppose fixing may be as simple as adding more control characters that should potentially be escaped there in that function.

from stdlib.

Michael-Mark-Edu avatar Michael-Mark-Edu commented on August 11, 2024

I opened a PR that should hopefully fix this issue. Null characters affecting string comparison is untouched because it's arguably intentional behavior.

from stdlib.

Michael-Mark-Edu avatar Michael-Mark-Edu commented on August 11, 2024

@mooreryan The \f patch is unreleased, and I originally did testing on the public build. I tested it lightly in the unreleased build and the \f patch does seem to work.

from stdlib.

mooreryan avatar mooreryan commented on August 11, 2024

This comment (#602 (comment)) has me thinking, should string.inspect handle more of the first 32 non-printable ascii characters?

The pull request #602 adds handling for \b \v and \e, however, it may be useful to also show the Gleam escape syntax for other non-printable characters. Going back to the original motivating example, if more of the non-printable characters were handled by string.inspect, then the diff would look something like this:

expected: Ok("abc123")
     got: Ok("\u{0008}abc123")

which seems more helpful.

from stdlib.

lpil avatar lpil commented on August 11, 2024

Oh that's clever. We could use that syntax for any invisible grapheme- is that what you're saying? How would we identify which ones are invisible?

from stdlib.

Hyperion-21 avatar Hyperion-21 commented on August 11, 2024

I made a chart with the invisible ones earlier in this thread. Theoretically, anything not being converted and has a value <32 is invisible (and 127). The conversion list just does the first enter found (or maybe not, I don't know Erlang well), so maybe we can just add a conversion rule to <32 at the end of the list. And 127 too.

I prefer \b over \u{08} when possible, but I don't think a lot of these invisible ones have a well defined C escape so it may be inevitable.

from stdlib.

mooreryan avatar mooreryan commented on August 11, 2024

Oh that's clever. We could use that syntax for any invisible grapheme- is that what you're saying? How would we identify which ones are invisible?

Yes that would be the general idea. The "simplest" solution may be as @Hyperion-21 says, and just convert the beginning of the ascii table. However, the point of identifying which are invisible is trickier than just taking ascii values < 32. For one thing, there are many "non printing" things outside of that range when you consider unicode...check it out:

import gleeunit
import gleeunit/should

pub fn main() {
  gleeunit.main()
}

pub fn a_test() {
  let x = "a b"
  let y = "a\u{0020}b"

  should.equal(x, y)
}

pub fn b_test() {
  let x = "a b"
  let y = "a\u{00A0}b"

  should.equal(x, y)
}

pub fn c_test() {
  let x = "a\u{0020}b"
  let y = "a\u{00A0}b"

  should.equal(x, y)
}

which yields:

Failures:

  1) invisible_chars_test.b_test
     Values were not equal
     expected: "a b"
          got: "a b"
     output: 

  2) invisible_chars_test.c_test
     Values were not equal
     expected: "a b"
          got: "a b"
     output: 

Those all look like spaces, but they're not the same. So, the "ideal" string.inspect function may somehow account for that. But it is getting trickier, and so maybe should be left to some 3rd party library? (not sure about that).

Second, you could imagine going beyond "invisible" characters. Check out this classic example:

pub fn e_accent_test() {
  let e1 = "\u{00E9}"
  let e2 = "\u{0065}\u{0301}"

  should.equal(e1, e2)
}

and that yields this:

  3) invisible_chars_test.e_accent_test
     Values were not equal
     expected: "eĢ"
          got: "Ć©"
     output: 

Which both look like the same e with accute accent.

Both of the outputs shown in those three failures could be considered pretty unhelpful, and worth treating, but, it is complicated, so I'm not sure how complex the string.inspect should be. It should probably be examined what some common other languages do.

My point the semantics of string.inspect could get tricky, and I'm not sure how far the escaping should be taken, though it could be potentially useful.


I prefer \b over \u{08} when possible, but I don't think a lot of these invisible ones have a well defined C escape so it may be inevitable.

It's true that \b is nice, but it is not valid gleam syntax.

from stdlib.

Michael-Mark-Edu avatar Michael-Mark-Edu commented on August 11, 2024

Is there anything stopping Gleam from supporting an extended set of escape codes? I found an old line in the compiler's changelog saying "Gleam now only supports \r, \n, \t, \", and \\ string escapes" which makes me think this is an intentional decision... but why? It seems like an arbitrary decision.

I'll update #602 momentarily to match the \u syntax. I'll also see if I can get it to show the invisible graphemes.

from stdlib.

Michael-Mark-Edu avatar Michael-Mark-Edu commented on August 11, 2024

Alright, that's done.

from stdlib.

Michael-Mark-Edu avatar Michael-Mark-Edu commented on August 11, 2024

Edited the parent post of this thread to better represent the current state of the issue/pr.

from stdlib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.