Comments (15)
I'm not sure what you mean by the \f
patch. The pull request you link doesn't change behavior with the form feed character, as it is already properly escaped in the code.
Edit: oh, I think you mean this patch.
from stdlib.
Thank you
from stdlib.
Here are all the escape codes that are relevant to this:
(code 127 is in the "invisible" category, codes 128+ are UTF-8 and shouldn't have special functionality)
from stdlib.
This also seems to be an Erlang-specific issue, since Javascript just turns everything into unicode
from stdlib.
Shouldn't form feed also be green in your list? It seems to escape fine:
import gleam/io
import gleam/list
import gleam/string
fn cp(n: Int) -> UtfCodepoint {
let assert Ok(cp) = string.utf_codepoint(n)
cp
}
pub fn main() {
[34, 92, 13, 10, 9, 12]
|> list.map(cp)
|> string.from_utf_codepoints
|> string.inspect
|> io.println
}
// prints: "\"\\\r\n\t\f"
See the erlang code here:
Line 490 in fe51781
from stdlib.
I suppose fixing may be as simple as adding more control characters that should potentially be escaped there in that function.
from stdlib.
I opened a PR that should hopefully fix this issue. Null characters affecting string comparison is untouched because it's arguably intentional behavior.
from stdlib.
@mooreryan The \f patch is unreleased, and I originally did testing on the public build. I tested it lightly in the unreleased build and the \f patch does seem to work.
from stdlib.
This comment (#602 (comment)) has me thinking, should string.inspect
handle more of the first 32 non-printable ascii characters?
The pull request #602 adds handling for \b \v and \e
, however, it may be useful to also show the Gleam escape syntax for other non-printable characters. Going back to the original motivating example, if more of the non-printable characters were handled by string.inspect
, then the diff would look something like this:
expected: Ok("abc123")
got: Ok("\u{0008}abc123")
which seems more helpful.
from stdlib.
Oh that's clever. We could use that syntax for any invisible grapheme- is that what you're saying? How would we identify which ones are invisible?
from stdlib.
I made a chart with the invisible ones earlier in this thread. Theoretically, anything not being converted and has a value <32 is invisible (and 127). The conversion list just does the first enter found (or maybe not, I don't know Erlang well), so maybe we can just add a conversion rule to <32 at the end of the list. And 127 too.
I prefer \b over \u{08} when possible, but I don't think a lot of these invisible ones have a well defined C escape so it may be inevitable.
from stdlib.
Oh that's clever. We could use that syntax for any invisible grapheme- is that what you're saying? How would we identify which ones are invisible?
Yes that would be the general idea. The "simplest" solution may be as @Hyperion-21 says, and just convert the beginning of the ascii table. However, the point of identifying which are invisible is trickier than just taking ascii values < 32. For one thing, there are many "non printing" things outside of that range when you consider unicode...check it out:
import gleeunit
import gleeunit/should
pub fn main() {
gleeunit.main()
}
pub fn a_test() {
let x = "a b"
let y = "a\u{0020}b"
should.equal(x, y)
}
pub fn b_test() {
let x = "a b"
let y = "a\u{00A0}b"
should.equal(x, y)
}
pub fn c_test() {
let x = "a\u{0020}b"
let y = "a\u{00A0}b"
should.equal(x, y)
}
which yields:
Failures:
1) invisible_chars_test.b_test
Values were not equal
expected: "a b"
got: "a b"
output:
2) invisible_chars_test.c_test
Values were not equal
expected: "a b"
got: "a b"
output:
Those all look like spaces, but they're not the same. So, the "ideal" string.inspect
function may somehow account for that. But it is getting trickier, and so maybe should be left to some 3rd party library? (not sure about that).
Second, you could imagine going beyond "invisible" characters. Check out this classic example:
pub fn e_accent_test() {
let e1 = "\u{00E9}"
let e2 = "\u{0065}\u{0301}"
should.equal(e1, e2)
}
and that yields this:
3) invisible_chars_test.e_accent_test
Values were not equal
expected: "eĢ"
got: "Ć©"
output:
Which both look like the same e
with accute accent.
Both of the outputs shown in those three failures could be considered pretty unhelpful, and worth treating, but, it is complicated, so I'm not sure how complex the string.inspect
should be. It should probably be examined what some common other languages do.
My point the semantics of string.inspect
could get tricky, and I'm not sure how far the escaping should be taken, though it could be potentially useful.
I prefer \b over \u{08} when possible, but I don't think a lot of these invisible ones have a well defined C escape so it may be inevitable.
It's true that \b
is nice, but it is not valid gleam syntax.
from stdlib.
Is there anything stopping Gleam from supporting an extended set of escape codes? I found an old line in the compiler's changelog saying "Gleam now only supports \r
, \n
, \t
, \"
, and \\
string escapes" which makes me think this is an intentional decision... but why? It seems like an arbitrary decision.
I'll update #602 momentarily to match the \u
syntax. I'll also see if I can get it to show the invisible graphemes.
from stdlib.
Alright, that's done.
from stdlib.
Edited the parent post of this thread to better represent the current state of the issue/pr.
from stdlib.
Related Issues (20)
- More string trimming capabilities HOT 4
- More functions for working with `Set`s HOT 3
- `list.count` and `iterator.count` HOT 12
- Add `set.symmetric_difference` (and not `set.is_superset`) HOT 1
- Bit array slices of slices incorrect on JavaScript
- Add `dict.change`? HOT 5
- Failing test when negating zero HOT 5
- `JSON.stringify` produces invalid Gleam escape sequences and should be replaced HOT 1
- `list.window` going into infinite recursion when `n` is 0 HOT 1
- Add primitives to set fields on `Uri` objects. HOT 4
- Add a `new()` primitive to the `uri` package. HOT 5
- Consider adding a replace function to regex HOT 2
- Add the ability to decode more than 9 fields. HOT 4
- uri.origin should not have any path HOT 2
- Have iterator.yield not wait for the next to be available before yielding the previous HOT 3
- Question: round and truncate return Int instead Float HOT 1
- Return index of undecodable element in dynamic.list DecodeError path HOT 1
- Error in interator.find_map documentation
- Add `gleam/set.{map}`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stdlib.