Giter VIP home page Giter VIP logo

Comments (37)

halturin avatar halturin commented on May 18, 2024

It's the nature of Erlang. Here is my explanation to the same question #44 (comment)

from ergo.

heri16 avatar heri16 commented on May 18, 2024

Yes, but other libraries like https://github.com/kbrw/node_erlastic or https://github.com/rusterlium/rustler allows to configure how they are decoded to suit Elixir semantics/convention?

image

from ergo.

halturin avatar halturin commented on May 18, 2024

I would say it's really bad practice.

from ergo.

heri16 avatar heri16 commented on May 18, 2024

Does it mean that this library will not have built in support for UTF-8 strings from Elxiir?

from ergo.

halturin avatar halturin commented on May 18, 2024

It's not related to UTF support. Strings in Golang are immutable. So if you got a binary and you want to treat it as a string just cast it to the string type.

from ergo.

heri16 avatar heri16 commented on May 18, 2024

The below seems to result in two different encodings on the receiver side.

This sends {[104, 105, 32, 230, 151, 165, 230, 156, 172, 32, 240, 159, 154, 128]} over the wire.

reply := etf.Term(etf.Tuple{etf.Atom("error"), etf.Atom("unknown_request")})
reply = etf.Tuple{"hi ζ—₯本θͺž πŸš€"}
return "reply", reply, state

This sends {"hi ζ—₯本θͺž πŸš€"} over the wire.

reply := etf.Term(etf.Tuple{etf.Atom("error"), etf.Atom("unknown_request")})
reply = etf.Tuple{[]byte("hi ζ—₯本θͺž πŸš€")}
return "reply", reply, state

Does every nested possibly UTF-8 string in reply need to be casted to []byte(str)?

Golang string type are natively UTF-8, sending them as charlists (that are encoded incorrectly) seems counter-intuitive.

from ergo.

halturin avatar halturin commented on May 18, 2024

It's not a problem of Ergo :). Go is a strongly, statically typed language. It means the Ergo encoder knows for sure what exact type of data it trying to encode. Any string will be encoded as a string type. On the Erlang side - there is magic in the air :).

from ergo.

halturin avatar halturin commented on May 18, 2024

Golang string type are natively UTF-8, sending them as charlists (that are encoded incorrectly)

not sure if I follow you here

from ergo.

heri16 avatar heri16 commented on May 18, 2024

See https://blog.golang.org/strings

string = readonly []byte

Strings in golang are semantically equivalent to binaries in Elixir/Erlang.

The ergo etf encoder encodes them as charlists, which produces invalid output (that cannot be decoded).

from ergo.

heri16 avatar heri16 commented on May 18, 2024

Any good encoder should consider the semantics and underlying layout of the golang environment, yes?

from ergo.

heri16 avatar heri16 commented on May 18, 2024

See how unicode is handled in Erlang:

https://erlang.org/doc/apps/stdlib/unicode_usage.html#the-interactive-shell

As the UTF-8 encoding is widely spread and provides some backward compatibility in the 7-bit ASCII range, it is selected as the standard encoding for Unicode characters in binaries for Erlang.

from ergo.

heri16 avatar heri16 commented on May 18, 2024

ergo tries too hard to encode go string into charlist.
When it should be just encoding string into binary. (Like most other libraries in golang ecosystem, written by experienced teams that appreciates the modern semantics and underlying layout of the golang types).

The actual charlist encoded by ergo eft is...

iex> to_string([104, 105, 32, 230, 151, 165, 230, 156, 172, 32, 240, 159, 154, 128])
<<104, 105, 32, 195, 166, 194, 151, 194, 165, 195, 166, 194, 156, 194, 172, 32,
  195, 176, 194, 159, 194, 154, 194, 128>>

The correct encoding marshalled by ergo eft should have been...

iex> to_string([104, 105, 32, 26085, 26412, 35486, 32, 128640])
"hi ζ—₯本θͺž πŸš€"

from ergo.

heri16 avatar heri16 commented on May 18, 2024

Ultimately there should be a way to configure or disable this "Heuristic String Detection" when handling lists from Erlang/Elixir.

And maybe a way to also configure or disable this "Heuristic List Encoding" when handling unicode strings from Go.

from ergo.

heri16 avatar heri16 commented on May 18, 2024

Erlang side - there is magic in the air :).

Regrettably, magic doesn't really work out for us. πŸ₯‡

from ergo.

halturin avatar halturin commented on May 18, 2024

I finally got the point ) sorry for the misunderstanding. Working on an improvement of handling Erlang/elixir charlist strings. (there will be a struct tag "charlist")

from ergo.

halturin avatar halturin commented on May 18, 2024

done. pushed to the master. please, let me know if you find any issue with that.

for sending charlist from the Ergo to Erlang it should be explicitly defined as a struct tag 'charlist'

type Struct struct {
   A string `etf:"fieldA charlist"`
}

on an Erlang side, it will be a map like this

#{'fieldA' => "Hello World! πŸš€"}

handle received "term" from the Erlang side should be used TermMapIntoStruct function

a := Struct{}
TermMapIntoStruct(term, &a)

or for the Tuple value

{ "Hello World! πŸš€"}

should be used TermIntoStruct

a := Struct{}
TermIntoStruct(term, &a)

from ergo.

heri16 avatar heri16 commented on May 18, 2024

Thanks for the commit! I've upgraded to latest version (v1.2.5-0.20210731234859-3217bf775f6e) from master branch.

However, the below still sends a charlist instead of a binary (https://erlang.org/doc/apps/erts/erl_ext_dist.html#bit_binary_ext)

reply = etf.Tuple{"hi ζ—₯本θͺž πŸš€"}
return "reply", reply, state
iex> GenServer.call({ :example, :'[email protected]' }, :hello)
{[104, 105, 32, 230, 151, 165, 230, 156, 172, 232, 170, 158, 32, 240, 159, 154, 128]}

Is there a way to configure this, when sending data from golang to erlang, considering that golang strings are just []byte ?

from ergo.

halturin avatar halturin commented on May 18, 2024

charlist - is a struct tag :) it must be applied to the struct field

from ergo.

heri16 avatar heri16 commented on May 18, 2024

I think there is a mix-up between this issue and the other one: #58

This issue is about disabling "Heuristic String Detection" in ergo that results in native strings in golang being encoded into etf List, while the other issue #58 is about Structs annotations/tags.

As mentioned by Erlang documentation itself: "String does not have a corresponding Erlang representation"

A golang string is not equal to an ETF string.

A golang string is equal to an ETF bitstring (binary).

See: https://blog.golang.org/strings - "It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes."

See: https://erlang.org/doc/apps/erts/erl_ext_dist.html#binary_ext - "This term represents a bitstring whose length in bits have to be a multiple of 8 bits."

from ergo.

halturin avatar halturin commented on May 18, 2024

we can not just enable/disable the conversion of 'charlist' to the string and back as it affects the whole node. The only way to do this for the specific data is using a struct tag. It means to send 'charlist' from the golang side you should use the struct with 'charlist' tag

type Struct struct {
   A string `etf:"fieldA charlist"`
}

reply := Struct{"hi ζ—₯本θͺž πŸš€"}
return "reply", reply, state

from ergo.

heri16 avatar heri16 commented on May 18, 2024

See: https://blog.golang.org/strings - "It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes."

The intention is to send all golang strings as an ETF bitstring / binary, instead of a charlist.

from ergo.

heri16 avatar heri16 commented on May 18, 2024

The node logic could use etf.String instead of golang string.

Add etf.String to etf.go:

type Atom string
type String string

from ergo.

halturin avatar halturin commented on May 18, 2024

I would suggest making this way for the charlist as well

type Charlist string // encodes as a List
type String string // encodes as a binary

TermToStruct/TermMapToStruct will be updated accordingly - detect destination type and convert from List to the Charlist string or binary to the String

and no tags anymore.

from ergo.

heri16 avatar heri16 commented on May 18, 2024

This could be how types are mapped:

Golang Type Erlang/Elixir Type
etf.String string/list (list of integers 0-255)
etf.Charlist list (list of integers with valid codepoints)
string binary

The above follows the semantics and memory layout of each platform.

from ergo.

halturin avatar halturin commented on May 18, 2024

can't agree with that

string -> binary

For the case Ergo<->Ergo we should be able to work with native types. Thats why Ergo encodes string https://github.com/halturin/ergo/blob/master/etf/encode.go#L399 as STRING_EXT https://erlang.org/doc/apps/erts/erl_ext_dist.html#string_ext

from ergo.

heri16 avatar heri16 commented on May 18, 2024

For the case Ergo<->Ergo, there should be no problem:

Ergo External Term Format Ergo
string binary string
etf.String string etf.String

Ergo at its current state works poorly with compliant-implementation of OTP such as Elixir.
Which shows ergo implementation of ETF/OTP might need another look.

Sending a UTF8 string between Ergo <-> Ergo is currently not idiomatic OTP on the wire.

from ergo.

halturin avatar halturin commented on May 18, 2024

the main idea of Ergo is to bring the cool stuff from the Erlang to the Golang world. It was never been a "driver" for the "idiomatic" access to the Erlang cluster. So having native types for the Ergo<->Ergo interaction is more prioritized and having smooth access to the erlang data types - is a bonus.

from ergo.

heri16 avatar heri16 commented on May 18, 2024

Sending a UTF8 string between Ergo <-> Ergo is currently not idiomatic OTP on the wire. Not sure what the reasoning behind this is.

Especially if we understand that Golang strings can contain more than just range 0-255, which https://erlang.org/doc/apps/erts/erl_ext_dist.html#string_ext obviously cannot.

A standard idiomatic golang approach to this would be to create a subtype that restricts golang strings to only 0-255, or throw an error. That could be etf.String

from ergo.

heri16 avatar heri16 commented on May 18, 2024

So having native types for the Ergo<->Ergo interaction is more prioritized

This seems like a type mismatch during mapping.

The proposed solution is more "native" yes (because of what golang strings are) ?

from ergo.

halturin avatar halturin commented on May 18, 2024

Sending a UTF8 string between Ergo <-> Ergo is currently not idiomatic OTP.

for the Ergo-Ergo I have no idea why should I care about it. UTF8 - its about the representation set of bytes. In Golang you can easily cast string to the []byte or []rune.

from ergo.

heri16 avatar heri16 commented on May 18, 2024

That's because in the Golang ecosytem and stdlib, strings are expected to contain UTF-8 as a first-class concept.

from ergo.

heri16 avatar heri16 commented on May 18, 2024

Any usage of values returned by golang libraries (other than ergo) would mean that we got to check if the string contains UTF-8 and do the casting to []byte. Values from common golang libraries may contain deeply nested strings. This is nasty User Experience.

from ergo.

halturin avatar halturin commented on May 18, 2024

Sending a UTF8 string between Ergo <-> Ergo is currently not idiomatic OTP.

Especially if we understand that Golang strings can contain more than just range 0-255, which https://erlang.org/doc/apps/erts/erl_ext_dist.html#string_ext obviously cannot.

A standard idiomatic golang approach to this would be to create a subtype that restricts golang strings to only 0-255, or throw an error. That is etf.String

I think this is enough said. Other more senior members of the Erlang / Elixir community may chime in in the future and offer their views (on the ergo implementation).

Maybe it's best for our team here to maintain a fork this library and name it ergo2.

up to you )

from ergo.

heri16 avatar heri16 commented on May 18, 2024

Was expecting a more open community here, that is open to feedback, but alas...

"Binary sharing occurs whenever binaries are taken apart. This is the fundamental reason why binaries are fast, decomposition can always be done with O(1) complexity."
-Erlang Team (who designed the ETF encoding format)

"It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes."
-Rob Pike (23 October 2013)

from ergo.

halturin avatar halturin commented on May 18, 2024

that's why Ergo uses STRING_EXT for the encoding strings and it's a convenient way for the case Ergo-Ergo.

Erlang handles it as a string if it has numbers 0-255 only (non UTF in terms of Erlang data types) and treats it as a byte list otherwise. Using etf.String for the encoding as a binary and etf.Charlist for the sending as a list of numbers would be enough to solve this issue.

OTP has a lot of good ideas but not all of them are good enough. Ergo has its own way :)

PS: To be an "open community" doesn't mean accept everything from anyone. It's an open-source project with MIT license.
Nobody pays me for this work.
You are welcome :)

from ergo.

halturin avatar halturin commented on May 18, 2024

Forgot to mention... If this feature is pretty important we could discuss a private repo for your company.

from ergo.

halturin avatar halturin commented on May 18, 2024

just released 2.0.0 with support of Erlang/Elixir strings.

from ergo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.