Giter VIP home page Giter VIP logo

cldr_collation's People

Contributors

foxbenjaminfox avatar kipcole9 avatar linusdm avatar phlppn avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cldr_collation's Issues

Ubuntu 22.04

Hello !

I'm dropping a line about my experience after having upgraded Ubuntu to 22.04 :

Under Ubuntu 22.04, the package candidate version of ICU is 70.
Without doing anything, Cld.Collation fails expecting libicu67.
I got everything up and running just by dpkg'ing https://packages.ubuntu.com/impish/amd64/libicu67/download

Many thanks for your work !

Can't load NIF library in the default phoenix debian docker image: undefined symbol: ucol_strcollIter_67

Iโ€™m trying to add ex_cldr_collation as a dependency to a standard phoenix application. The problem arises when building and running the application in a docker container. These are the steps to reproduce:

  • mix phx.new my_app (version 1.7.0-rc.3)
  • add dependency to ex_cldr_collation in mix.exs (version 0.7.1)
  • mix phx.gen.release --docker to generate the default Dockerfile
  • modify Dockerfile and add apt package libicu-dev to the builder image, as explained in the docs (pkgconf is not really reaquired I think this was a wrong assumption, read ahead in the comments)
  • docker build . -t my_app
  • docker run -it my_app

When the image starts the phoenix application, I get this error:

  crasher:
    initial call: application_master:init/4
    pid: <0.2044.0>
    registered_name: []
    exception exit: {{shutdown,
                      {failed_to_start_child,kernel_safe_sup,
                       {on_load_function_failed,'Elixir.Cldr.Collation',
                        {{badmatch,
                          {error,
                           {load_failed,
                            "Failed to load NIF library: '/app/lib/ex_cldr_collation-0.7.1/priv/ucol.so: undefined symbol: ucol_strcollIter_67'"}}},
                         [{'Elixir.Cldr.Collation',init,0,
                           [{file,"lib/cldr_collation.ex"},{line,20}]},
                          {init,'-run_on_load_handlers/2-fun-0-',1,[]}]}}}},
                     {kernel,start,[normal,[]]}}
      in function  application_master:init/4 (application_master.erl, line 142)
  ...

Adding the debian package libicu67 to the runner image doesn't solve the problem. This is unexpected though: the symbol ucol_strcollIter_67 is found in the shared object at /usr/lib/x86_64-linux-gnu/libicui18n.so (which comes with said debian package).
I checked by running nm -D --demangle libicui18n.so in the docker container (after installing binutils to get the nm utility), where the sybol shows up with a T marker, which should indicate it's present in the so file (that's an assumption, because I don't really know what I'm doing here ๐Ÿคท ).

I don't know how this should work, but is there something missing to tell elixir to also look into this other so file when loading NIF's? I'd be surprised, because it seems to work on other distro's without any problems.

Surprising ordering of capital letters

My main motivation for using ex_cldr_collation is to "properly" sort binaries with Polish letters:

iex> Enum.sort(["a", "b", "ฤ…"])
["a", "b", "ฤ…"]
iex> Enum.sort(["a", "b", "ฤ…"], Cldr.Collation.Sensitive)
["a", "ฤ…", "b"]

This is exactly what I need, but the ordering of capital letters is completely surprising:

iex> Enum.sort(["a", "b", "A", "B"])
["A", "B", "a", "b"]
iex> Enum.sort(["a", "b", "A", "B"], Cldr.Collation.Sensitive)
["a", "A", "b", "B"]

So not only "a" comes before "A", but also "A" comes before "b"! I guess the second part ("A" < "b") makes sense and I'm too used to ASCII-table based sorting, but I was wondering if there is an easy way to sort so that "A" < "a" and "a" < "b"?

Either way, thanks for making this library! ๐Ÿ‘

Error on init path

Hello!

First of all, thank you for your work here. It's really helpful.

I'm opening this issue because I think I have found a bug while trying to init this library. This bug happens when you have your app compiled and you rename the folder where it is located. Example: I have my ~/projects/myapp and I rename it to ~/projects/mypersonalapp.

At this point the library is not working and I think that the problem is here and I think that could be solved modifying the init/0 function as follows:

def init do
    so_path = :code.priv_dir(:ex_cldr_collation) ++ '/ucol'
    num_scheds = :erlang.system_info(:schedulers)

    :ok = :erlang.load_nif(so_path, num_scheds)
end

instead of using the module attribute @so_path. Do you agree?

I've seen this approach in other libraries like AppSignal.

I found the problem while trying to deploy my application where we build the application in a /tmp folder and after that we move it to the /app folder.

Does not compile with OTP 23 (Mac OS)

Hey!

using OTP 23 I got the linking error while compiling my project (mix compile):

cc c_src/ucol.o -arch x86_64 -flat_namespace -undefined suppress -shared -L/usr/local/Cellar/erlang/23.2.2/lib/erlang/usr/lib -lerl_interface -lei -lpthread -lm -licucore -lstdc++ -o ./priv/ucol.so
ld: library not found for -lerl_interface
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [priv/ucol.so] Error 1

It looks that the lib erl_interface was removed (maybe renamed) starting with OTP 23. The lib folder of OTP 23 contains

liberts.a
libei_st.a
libei.a
liberts_r.a

while OTP 22 the lib folder contains

liberl_interface.a
libei_st.a
libei.a
liberl_interface_st.a
liberts_r.a
liberts.a

Data provider for collation data with icu-collator

CLDR collations are configured per-locale (typically per-language in reality) in a set of configuration files. These files need to be available to icu-collator through its data provider interface.

Including the data files in ex_cldr_collation seems reasonable. They are not large files since they represent only tailorings of the standard DUCET collation.

Questions

  1. Does icu-collator depend on other CLDR data than these collation files?
  2. Do any of the existing data provider mechanisms in icu-collator support loading these files. And if so, how is that configured?

I'll see what I can learn from reading more of the rust docs but I'm in deep water when it comes to that so any suggestions you have would be warmly welcomed!

Move to rust-based icu_collator bindings

icu_collator is fully TR #10 compliant and being in Rust can enable a fast, safe and developer friendly experience. I see the following advantages:

  • Fast, being NIF and Rust based
  • Can take advantage of Rustler precompiled to easy installation across different machine types
  • Fully compliant meaning it supports runtime options to influence ordering which is helpful in several UI situations (and others).

@foxbenjaminfox has kindly offered to work on the Rust bindings while I work on the overall library, Elixir api and documentation.

Elixir Public API

The basic Elixir API (not the NIF API) I envisage as:

  • Cldr.Collation.sort(list_of_binaries, options) where options is a keyword list. This should only required one NIF round-trip but would require instantiating a new collator on each call. Options can include :locale (the default is Cldr.default_locale/0). Other options would be Elixir expressions of the icu_collator type CollatorOptions.
  • Cldr.Collation.compare(string, string, options) where is options is a keyword list.
  • Cldr.Collation.collator(language_tag) which returns a collator (Rust-based resource) that can be reused and may (possibly) we stored in a Cldr.LanguageTag.t/0 struct. Options can include :locale (the default is Cldr.default_locale/0). Other options would be Elixir expressions of the icu_collator type CollatorOptions.

NIF API

This API needs to be as simple as possible and be driven by the needs of icu_collator and Rustler. Hopefully the interface can directly use the binaries in the Rust code without copying but the memory models are different and this may not be possible. Sorting by adjusting pointers would be more efficient than memory copies.

At minimum the NIF API should expect to accept:

  • List of Elixir binaries being strings to be ordered.
  • Elixir binaries to be compared
  • An options list (in a format best determined by the most efficient way to pass options to the Rustler NIF)
  • A language tag as a binary (probably using Cldr.LanguageTag's :canonical_locale_name field)

It should be able to return:

  • A list of binaries
  • Comparison indicators (ie greater than, less than, ....)
  • Status codes as appropriate

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.