Fixed by <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

I'm not sure what you mean by the patch. The pull

Here are all the escape codes that are relevant to this: <a target="_blank" rel="n

This also seems to be an Erlang-specific issue, since Java just turns everything

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This comment (<a class="issue-link js-issue-link" data-error-text="Failed to load titl

Escape backspace etc graphemes in string.inspect about stdlib HOT 15 CLOSED

Michael-Mark-Edu commented on August 11, 2024

Escape backspace etc graphemes in string.inspect

from stdlib.

Comments (15)

mooreryan commented on August 11, 2024 1

I'm not sure what you mean by the \f patch. The pull request you link doesn't change behavior with the form feed character, as it is already properly escaped in the code.

Edit: oh, I think you mean this patch.

from stdlib.

lpil commented on August 11, 2024

Thank you

from stdlib.

Michael-Mark-Edu commented on August 11, 2024

Here are all the escape codes that are relevant to this:

(code 127 is in the "invisible" category, codes 128+ are UTF-8 and shouldn't have special functionality)

from stdlib.

Michael-Mark-Edu commented on August 11, 2024

This also seems to be an Erlang-specific issue, since Javascript just turns everything into unicode

from stdlib.

mooreryan commented on August 11, 2024

Shouldn't form feed also be green in your list? It seems to escape fine:

import gleam/io
import gleam/list
import gleam/string

fn cp(n: Int) -> UtfCodepoint {
  let assert Ok(cp) = string.utf_codepoint(n)
  cp
}

pub fn main() {
  [34, 92, 13, 10, 9, 12]
  |> list.map(cp)
  |> string.from_utf_codepoints
  |> string.inspect
  |> io.println
}

// prints: "\"\\\r\n\t\f"

See the erlang code here:

stdlib/src/gleam_stdlib.erl

Line 490 in fe51781

inspect_maybe_utf8_string(Binary, Acc) ->

from stdlib.

mooreryan commented on August 11, 2024

I suppose fixing may be as simple as adding more control characters that should potentially be escaped there in that function.

from stdlib.

Michael-Mark-Edu commented on August 11, 2024

I opened a PR that should hopefully fix this issue. Null characters affecting string comparison is untouched because it's arguably intentional behavior.

from stdlib.

Michael-Mark-Edu commented on August 11, 2024

@mooreryan The \f patch is unreleased, and I originally did testing on the public build. I tested it lightly in the unreleased build and the \f patch does seem to work.

from stdlib.

mooreryan commented on August 11, 2024

This comment (#602 (comment)) has me thinking, should string.inspect handle more of the first 32 non-printable ascii characters?

The pull request #602 adds handling for \b \v and \e, however, it may be useful to also show the Gleam escape syntax for other non-printable characters. Going back to the original motivating example, if more of the non-printable characters were handled by string.inspect, then the diff would look something like this:

expected: Ok("abc123")
     got: Ok("\u{0008}abc123")

which seems more helpful.

from stdlib.

lpil commented on August 11, 2024

Oh that's clever. We could use that syntax for any invisible grapheme- is that what you're saying? How would we identify which ones are invisible?

from stdlib.

Hyperion-21 commented on August 11, 2024

I made a chart with the invisible ones earlier in this thread. Theoretically, anything not being converted and has a value <32 is invisible (and 127). The conversion list just does the first enter found (or maybe not, I don't know Erlang well), so maybe we can just add a conversion rule to <32 at the end of the list. And 127 too.

I prefer \b over \u{08} when possible, but I don't think a lot of these invisible ones have a well defined C escape so it may be inevitable.

from stdlib.

mooreryan commented on August 11, 2024

Oh that's clever. We could use that syntax for any invisible grapheme- is that what you're saying? How would we identify which ones are invisible?

Yes that would be the general idea. The "simplest" solution may be as @Hyperion-21 says, and just convert the beginning of the ascii table. However, the point of identifying which are invisible is trickier than just taking ascii values < 32. For one thing, there are many "non printing" things outside of that range when you consider unicode...check it out:

import gleeunit
import gleeunit/should

pub fn main() {
  gleeunit.main()
}

pub fn a_test() {
  let x = "a b"
  let y = "a\u{0020}b"

  should.equal(x, y)
}

pub fn b_test() {
  let x = "a b"
  let y = "a\u{00A0}b"

  should.equal(x, y)
}

pub fn c_test() {
  let x = "a\u{0020}b"
  let y = "a\u{00A0}b"

  should.equal(x, y)
}

which yields:

Failures:

  1) invisible_chars_test.b_test
     Values were not equal
     expected: "a b"
          got: "a b"
     output: 

  2) invisible_chars_test.c_test
     Values were not equal
     expected: "a b"
          got: "a b"
     output:

Those all look like spaces, but they're not the same. So, the "ideal" string.inspect function may somehow account for that. But it is getting trickier, and so maybe should be left to some 3rd party library? (not sure about that).

Second, you could imagine going beyond "invisible" characters. Check out this classic example:

pub fn e_accent_test() {
  let e1 = "\u{00E9}"
  let e2 = "\u{0065}\u{0301}"

  should.equal(e1, e2)
}

and that yields this:

  3) invisible_chars_test.e_accent_test
     Values were not equal
     expected: "é"
          got: "é"
     output:

Which both look like the same e with accute accent.

Both of the outputs shown in those three failures could be considered pretty unhelpful, and worth treating, but, it is complicated, so I'm not sure how complex the string.inspect should be. It should probably be examined what some common other languages do.

My point the semantics of string.inspect could get tricky, and I'm not sure how far the escaping should be taken, though it could be potentially useful.

I prefer \b over \u{08} when possible, but I don't think a lot of these invisible ones have a well defined C escape so it may be inevitable.

It's true that \b is nice, but it is not valid gleam syntax.

from stdlib.

Michael-Mark-Edu commented on August 11, 2024

Is there anything stopping Gleam from supporting an extended set of escape codes? I found an old line in the compiler's changelog saying "Gleam now only supports \r, \n, \t, \", and \\ string escapes" which makes me think this is an intentional decision... but why? It seems like an arbitrary decision.

I'll update #602 momentarily to match the \u syntax. I'll also see if I can get it to show the invisible graphemes.

from stdlib.

Michael-Mark-Edu commented on August 11, 2024

Alright, that's done.

from stdlib.

Michael-Mark-Edu commented on August 11, 2024

Edited the parent post of this thread to better represent the current state of the issue/pr.

from stdlib.

Escape backspace etc graphemes in string.inspect about stdlib HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent