Comments (37)
It's the nature of Erlang. Here is my explanation to the same question #44 (comment)
from ergo.
Yes, but other libraries like https://github.com/kbrw/node_erlastic or https://github.com/rusterlium/rustler allows to configure how they are decoded to suit Elixir semantics/convention?
from ergo.
I would say it's really bad practice.
from ergo.
Does it mean that this library will not have built in support for UTF-8 strings from Elxiir?
from ergo.
It's not related to UTF support. Strings in Golang are immutable. So if you got a binary and you want to treat it as a string just cast it to the string type.
from ergo.
The below seems to result in two different encodings on the receiver side.
This sends {[104, 105, 32, 230, 151, 165, 230, 156, 172, 32, 240, 159, 154, 128]}
over the wire.
reply := etf.Term(etf.Tuple{etf.Atom("error"), etf.Atom("unknown_request")})
reply = etf.Tuple{"hi ζ₯ζ¬θͺ π"}
return "reply", reply, state
This sends {"hi ζ₯ζ¬θͺ π"}
over the wire.
reply := etf.Term(etf.Tuple{etf.Atom("error"), etf.Atom("unknown_request")})
reply = etf.Tuple{[]byte("hi ζ₯ζ¬θͺ π")}
return "reply", reply, state
Does every nested possibly UTF-8 string in reply need to be casted to []byte(str)?
Golang string type are natively UTF-8, sending them as charlists (that are encoded incorrectly) seems counter-intuitive.
from ergo.
It's not a problem of Ergo :). Go is a strongly, statically typed language. It means the Ergo encoder knows for sure what exact type of data it trying to encode. Any string will be encoded as a string type. On the Erlang side - there is magic in the air :).
from ergo.
Golang string type are natively UTF-8, sending them as charlists (that are encoded incorrectly)
not sure if I follow you here
from ergo.
See https://blog.golang.org/strings
string = readonly []byte
Strings in golang are semantically equivalent to binaries in Elixir/Erlang.
The ergo etf encoder encodes them as charlists, which produces invalid output (that cannot be decoded).
from ergo.
Any good encoder should consider the semantics and underlying layout of the golang environment, yes?
from ergo.
See how unicode is handled in Erlang:
https://erlang.org/doc/apps/stdlib/unicode_usage.html#the-interactive-shell
As the UTF-8 encoding is widely spread and provides some backward compatibility in the 7-bit ASCII range, it is selected as the standard encoding for Unicode characters in binaries for Erlang.
from ergo.
ergo tries too hard to encode go string into charlist.
When it should be just encoding string into binary. (Like most other libraries in golang ecosystem, written by experienced teams that appreciates the modern semantics and underlying layout of the golang types).
The actual charlist encoded by ergo eft is...
iex> to_string([104, 105, 32, 230, 151, 165, 230, 156, 172, 32, 240, 159, 154, 128])
<<104, 105, 32, 195, 166, 194, 151, 194, 165, 195, 166, 194, 156, 194, 172, 32,
195, 176, 194, 159, 194, 154, 194, 128>>
The correct encoding marshalled by ergo eft should have been...
iex> to_string([104, 105, 32, 26085, 26412, 35486, 32, 128640])
"hi ζ₯ζ¬θͺ π"
from ergo.
Ultimately there should be a way to configure or disable this "Heuristic String Detection" when handling lists from Erlang/Elixir.
And maybe a way to also configure or disable this "Heuristic List Encoding" when handling unicode strings from Go.
from ergo.
Erlang side - there is magic in the air :).
Regrettably, magic doesn't really work out for us. π₯
from ergo.
I finally got the point ) sorry for the misunderstanding. Working on an improvement of handling Erlang/elixir charlist strings. (there will be a struct tag "charlist")
from ergo.
done. pushed to the master. please, let me know if you find any issue with that.
for sending charlist from the Ergo to Erlang it should be explicitly defined as a struct tag 'charlist'
type Struct struct {
A string `etf:"fieldA charlist"`
}
on an Erlang side, it will be a map like this
#{'fieldA' => "Hello World! π"}
handle received "term" from the Erlang side should be used TermMapIntoStruct function
a := Struct{}
TermMapIntoStruct(term, &a)
or for the Tuple value
{ "Hello World! π"}
should be used TermIntoStruct
a := Struct{}
TermIntoStruct(term, &a)
from ergo.
Thanks for the commit! I've upgraded to latest version (v1.2.5-0.20210731234859-3217bf775f6e) from master branch.
However, the below still sends a charlist instead of a binary (https://erlang.org/doc/apps/erts/erl_ext_dist.html#bit_binary_ext)
reply = etf.Tuple{"hi ζ₯ζ¬θͺ π"}
return "reply", reply, state
iex> GenServer.call({ :example, :'[email protected]' }, :hello)
{[104, 105, 32, 230, 151, 165, 230, 156, 172, 232, 170, 158, 32, 240, 159, 154, 128]}
Is there a way to configure this, when sending data from golang to erlang, considering that golang strings are just []byte ?
from ergo.
charlist - is a struct tag :) it must be applied to the struct field
from ergo.
I think there is a mix-up between this issue and the other one: #58
This issue is about disabling "Heuristic String Detection" in ergo that results in native strings in golang being encoded into etf List, while the other issue #58 is about Structs annotations/tags.
As mentioned by Erlang documentation itself: "String does not have a corresponding Erlang representation"
A golang string is not equal to an ETF string.
A golang string is equal to an ETF bitstring (binary).
See: https://blog.golang.org/strings - "It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes."
See: https://erlang.org/doc/apps/erts/erl_ext_dist.html#binary_ext - "This term represents a bitstring whose length in bits have to be a multiple of 8 bits."
from ergo.
we can not just enable/disable the conversion of 'charlist' to the string and back as it affects the whole node. The only way to do this for the specific data is using a struct tag. It means to send 'charlist' from the golang side you should use the struct with 'charlist' tag
type Struct struct {
A string `etf:"fieldA charlist"`
}
reply := Struct{"hi ζ₯ζ¬θͺ π"}
return "reply", reply, state
from ergo.
See: https://blog.golang.org/strings - "It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes."
The intention is to send all golang strings as an ETF bitstring / binary, instead of a charlist.
from ergo.
The node logic could use etf.String instead of golang string.
Add etf.String to etf.go
:
type Atom string
type String string
from ergo.
I would suggest making this way for the charlist as well
type Charlist string // encodes as a List
type String string // encodes as a binary
TermToStruct/TermMapToStruct will be updated accordingly - detect destination type and convert from List to the Charlist string or binary to the String
and no tags anymore.
from ergo.
This could be how types are mapped:
Golang Type | Erlang/Elixir Type |
---|---|
etf.String | string/list (list of integers 0-255) |
etf.Charlist | list (list of integers with valid codepoints) |
string | binary |
The above follows the semantics and memory layout of each platform.
from ergo.
can't agree with that
string -> binary
For the case Ergo<->Ergo we should be able to work with native types. Thats why Ergo encodes string https://github.com/halturin/ergo/blob/master/etf/encode.go#L399 as STRING_EXT https://erlang.org/doc/apps/erts/erl_ext_dist.html#string_ext
from ergo.
For the case Ergo<->Ergo, there should be no problem:
Ergo | External Term Format | Ergo |
---|---|---|
string | binary | string |
etf.String | string | etf.String |
Ergo at its current state works poorly with compliant-implementation of OTP such as Elixir.
Which shows ergo implementation of ETF/OTP might need another look.
Sending a UTF8 string between Ergo <-> Ergo is currently not idiomatic OTP on the wire.
from ergo.
the main idea of Ergo is to bring the cool stuff from the Erlang to the Golang world. It was never been a "driver" for the "idiomatic" access to the Erlang cluster. So having native types for the Ergo<->Ergo interaction is more prioritized and having smooth access to the erlang data types - is a bonus.
from ergo.
Sending a UTF8 string between Ergo <-> Ergo is currently not idiomatic OTP on the wire. Not sure what the reasoning behind this is.
Especially if we understand that Golang strings can contain more than just range 0-255, which https://erlang.org/doc/apps/erts/erl_ext_dist.html#string_ext obviously cannot.
A standard idiomatic golang approach to this would be to create a subtype that restricts golang strings to only 0-255, or throw an error. That could be etf.String
from ergo.
So having native types for the Ergo<->Ergo interaction is more prioritized
This seems like a type mismatch during mapping.
The proposed solution is more "native" yes (because of what golang strings are) ?
from ergo.
Sending a UTF8 string between Ergo <-> Ergo is currently not idiomatic OTP.
for the Ergo-Ergo I have no idea why should I care about it. UTF8 - its about the representation set of bytes. In Golang you can easily cast string to the []byte or []rune.
from ergo.
That's because in the Golang ecosytem and stdlib, strings are expected to contain UTF-8 as a first-class concept.
from ergo.
Any usage of values returned by golang libraries (other than ergo) would mean that we got to check if the string contains UTF-8 and do the casting to []byte. Values from common golang libraries may contain deeply nested strings. This is nasty User Experience.
from ergo.
Sending a UTF8 string between Ergo <-> Ergo is currently not idiomatic OTP.
Especially if we understand that Golang strings can contain more than just range 0-255, which https://erlang.org/doc/apps/erts/erl_ext_dist.html#string_ext obviously cannot.
A standard idiomatic golang approach to this would be to create a subtype that restricts golang strings to only 0-255, or throw an error. That is etf.String
I think this is enough said. Other more senior members of the Erlang / Elixir community may chime in in the future and offer their views (on the ergo implementation).
Maybe it's best for our team here to maintain a fork this library and name it ergo2.
up to you )
from ergo.
Was expecting a more open community here, that is open to feedback, but alas...
"Binary sharing occurs whenever binaries are taken apart. This is the fundamental reason why binaries are fast, decomposition can always be done with O(1) complexity."
-Erlang Team (who designed the ETF encoding format)
"It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes."
-Rob Pike (23 October 2013)
from ergo.
that's why Ergo uses STRING_EXT for the encoding strings and it's a convenient way for the case Ergo-Ergo.
Erlang handles it as a string if it has numbers 0-255 only (non UTF in terms of Erlang data types) and treats it as a byte list otherwise. Using etf.String for the encoding as a binary and etf.Charlist for the sending as a list of numbers would be enough to solve this issue.
OTP has a lot of good ideas but not all of them are good enough. Ergo has its own way :)
PS: To be an "open community" doesn't mean accept everything from anyone. It's an open-source project with MIT license.
Nobody pays me for this work.
You are welcome :)
from ergo.
Forgot to mention... If this feature is pretty important we could discuss a private repo for your company.
from ergo.
just released 2.0.0 with support of Erlang/Elixir strings.
from ergo.
Related Issues (20)
- GitHub actions fail on all platforms due to integration E2E tests HOT 1
- Unittest `TestDecodeFragment` occasionally fails
- i found rpc:call from erlang to ergo is not working HOT 2
- Ergo receives DIST packet along with the HANDSHAKE final packet. HOT 1
- Second CallRPC method is stuck when it is called against remote erlang node HOT 3
- Does Ergo have a plan to support OTP25/OTP26? HOT 1
- how to use simple_one_for_one under supervisor tree HOT 4
- Panic invalid memory address after reconnect a node HOT 6
- bug for deleteProcess HOT 3
- bug for (*process).UnregisterName HOT 2
- I doubt benchmarks be fair !!! HOT 3
- How to change Logger HOT 2
- gen.TcpHandle Question HOT 10
- When I close it frequently, there will be an unrecoverable panic. HOT 3
- Increase test coverage to 80% HOT 2
- node link problem? How to find HOT 8
- node link problemοΌ HOT 2
- Support for Erlang/OTP 26 HOT 1
- Node.Stop () and node.Wait () will immediately end the process HOT 2
- PoolBehavior does not add restarted process to pool.monitors HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ergo.