Giter VIP home page Giter VIP logo

rust_icu's Introduction

rust_icu: low-level rust language bindings for the ICU library

Item Description
Testing Test status
Source https://github.com/google/rust_icu
README https://github.com/google/rust_icu/blob/main/README.md
Coverage View report
Docs https://docs.rs/crate/rust_icu

This is a library of low level native rust language bindings for the International Components for Unicode (ICU) library for C (a.k.a. ICU4C).

If you just want quick instructions on how to download and install, see the quickstart guide

See the ICU project home page for details about the ICU library. The library source can be viewed on Github.

The latest version of this file is available at https://github.com/google/rust_icu.

This is not an officially supported Google product.

Why wrap ICU (vs. doing anything else)?

  • The rust language Internationalisation page confirms that ICU support in rust is spotty, so having a functional wrapper helps advance the state of the art.

  • Projects such as Fuchsia already depend on ICU, and having rust bindings allows for an easy way to use Unicode algorithms without taking on more dependencies.

  • Cooperation on the interface with projects such as the ICU4X could allow seamless transition to an all-rust implementation in the future.

Structure of the repository

The repository is organized as a cargo workspace of rust crates. Each crate corresponds to the respective header in the ICU4C library's C API. Please consult the coverage report for details about function coverage in the headers.

Crate Description
rust_icu Top-level crate. Include this if you just want to have all the functionality available for use.
rust_icu_common Commonly used low-level wrappings of the bindings.
rust_icu_intl Implements ECMA 402 recommendation APIs.
rust_icu_sys Low-level bindings code
rust_icu_ubrk Support for text boundary analysis. Implements ubrk.h C API header from the ICU library.
rust_icu_ucal ICU Calendar. Implements ucal.h C API header from the ICU library.
rust_icu_ucol Collation support. Implements ucol.h C API header from the ICU library.
rust_icu_udat ICU date and time. Implements udat.h C API header from the ICU library.
rust_icu_udata ICU binary data. Implements udata.h C API header from the ICU library.
rust_icu_uenum ICU enumerations. Implements uenum.h C API header from the ICU library. Mainly UEnumeration and friends.
rust_icu_uformattable Locale-sensitive list formatting support. Implements uformattable.h C API header from the ICU library. Since 0.3.1.
rust_icu_ulistformatter Locale-sensitive list formatting support. Implements ulistformatter.h C API header from the ICU library.
rust_icu_uloc Locale support. Implements uloc.h C API header from the ICU library.
rust_icu_umsg MessageFormat support. Implements umsg.h C API header from the ICU library.
rust_icu_unorm2 Unicode normalization support. Implements unorm2.h C API header from the ICU library.
rust_icu_unum Number formatting support. Implements unum.h C API header from the ICU library.
rust_icu_unumberformatter Number formatting support (modern). Implements unumberformatter.h C API header from the ICU library.
rust_icu_upluralrules Locale-sensitive plural rules support. Implements upluralrules.h C API header from the ICU library.
rust_icu_ustring ICU strings. Implements ustring.h C API header from the ICU library.
rust_icu_utext Text operations. Implements utext.h C API header from the ICU library.
rust_icu_utrans Transliteration support. Implements utrans.h C API header from the ICU library.

Limitations

The generated rust language binding methods of today limit the availability of language bindings to the available C API. The ICU library's C API (sometimes referred to as ICU4C in the documentation) is distinct from the ICU C++ API.

The bindings offered by this library have somewhat limited applicability, which means it may sometimes not work for you out of the box. If you come across such a case, feel free to file a bug for us to fix. Pull requests are welcome.

The limitations we know of today are as follows:

  • There isn't a guaranted feature parity. Some algorithms that are implemented in C++ don't have a C equivalent, and vice-versa. This is usually not a problem if you are using the library from C++, since you are free to choose whichever API surface works for you. But it is an issue for rust bindings, since we can only use the C API at the moment.

  • A C++ implementation of a new algorithm is not necessarily always reflected in the C API, leading to feature disparity between the C and C++ API surfaces. See for example this bug as an illustration.

  • While using icu_config feature will likely allow you some freedom to auto-generate bindings for your own library version, we still need to keep a list of explicitly supported ICU versions to ensure that the wrappers are stable.

Compatibility

The compatibility guarantee is as follows:

  1. Automated tests are executed for last three major ICU library versions in all feature combinations of interest.
  2. Automated tests are executed for the ICU library version in use by the docs.rs system (so the documentation could be built).
rust_icu version ICU 63.x ICU 69.1 ICU 70.1 ICU 71.1 ICU 72.1 ICU 73.1
2.0
3.0
4.0

Features

The rust_icu library is intended to be compiled with cargo, with one of several features enabled. Compilation with cargo allows us to do some library detection in a custom build.rs file in the rust_icu_sys library and adapt the build process to your build environment. However, since not every development environment will use the same settings, we opted to offer certain features (below) as configuration options.

While our intention is to keep the list of features below up to date with the actual list in Cargo.toml, the list may periodically go out of date.

To use any of the features, you will need to activate the feature in all the rust_icu_* crates that you intend to use. Failing to do this will result in confusing compilation end result.

Feature Default? Description
use-bindgen Yes If set, cargo will run bindgen to generate bindings based on the installed ICU library. The program icu-config must be in $PATH for this to work. In the future there may be other approaches for auto-detecting libraries, such as via pkg-config.
renaming Yes If set, ICU bindings are generated with version numbers appended. This is called "renaming" in ICU, and is normally needed only when linking against specific ICU version is required, for example to work around having to link different ICU versions. See the ICU documentation for a discussion of renaming. This feature MUST be used when bindgen is NOT used.
icu_config Yes If set, the binary icu-config will be used to configure the library. Turn this feature off if you do not want build.rs to try to autodetect the build environment. You will want to skip this feature if your build environment configures ICU in a different way. This feature is only meaningful when bindgen feature is used; otherwise it has no effect.
icu_version_in_env No If set, ICU bindings are made for the ICU version specified in the environment variable RUST_ICU_MAJOR_VERSION_NUMBER, which is made available to cargo at build time. See section below for details on how to use this feature. This feature is only meaningful when bindgen feature is NOT used; otherwise it has no effect.
static No If set, link ICU libraries statically (and the standard C++ dynamically). You can use RUST_ICU_LINK_SEARCH_DIR to add an extra path to the search path if you have a build of ICU in a non-standard directory.

Prerequisites

Required

  • rust_icu source code

    Clone with git:

    git clone https://github.com/google/rust_icu.git
    
  • rustup

    Install from https://rustup.rs. Used to set toolchain defaults. This will install cargo as well.

  • Clang

    You must have Clang installed to access the right headers.

  • The ICU library development environmnet

    You will need access to the ICU libraries for the rust_icu bindings to link against. Download and installation of ICU is out of scope of this document. Please read through the ICU introduction to learn how to build and install.

    Sometimes, the ICU library will be preinstalled on your system, or you can pull the library in from your package management program. However, this library won't necessarily be the one that you need to link into the program you are developing. In short, it is your responsibility to have a developer version of ICU handy somewhere on your system.

    We have a quickstart install that may get you well on the way in case your environment happens to be configured very similarly to ours and you want to build ICU from source.

Optional

  • GNU Make, if you want to use the make-based build and test.

    Installing GNU Make is beyond the scope of this file. Please refer to your OS instructions for installation.

  • docker, if you decide to use docker-based build and test.

    Installing docker is beyond the scope of this file, please see the docker installation instructions for details. As installing docker is intrusive to the host machine, your company may have internal documentation on how to install docker properly.

  • icu-config utility, if icu_config feature is used.

    You need to install the ICU library on your system, such that the binary icu-config is somewhere in your $PATH. The build script will use it to discover the library settings and generate correct link scripts. If you use the feature but icu-config is not found,

  • bindgen utility, if bindgen feature is used.

    bindgen user guide for instructions on how to install it.

  • rustfmt utility, if bindgen feature is used.

    See https://github.com/rust-lang/rustfmt for instructions on how to install.

Testing

There are a few options to run the test for rust_icu.

Cargo

Building and testing using cargo is the canonical way of building and testing rust code.

In the case of the rust_icu library you may find that your system's default ICU development package is ancient, in which case you will need to build your own ICU4C library (see below for that). That will make it necessary to pass in PKG_CONFIG_PATH and LD_LIBRARY_PATH environment variables to help the bulid code locate and use the library you built, instead of the system default.

The following tests should all build and pass. Note that because the libraries needed are in a custom location, we need to set LD_LIBRARY_PATH when running the tests, as well as PKG_CONFIG_PATH.

If you find that you are able to use your system's default ICU installation, you can safely omit the two libraries.

env PKG_CONFIG_PATH="$HOME/local/lib/pkgconfig" \
    LD_LIBRARY_PATH="$HOME/local/lib" \
        bash -c 'cargo test'

If you think that the above approach is too much of a hassle, consider trying out the Docker-based approach.

GNU Make

If you happen to like the GNU way of doing things, you may appreciate the GNU Make approach.

The easiest way is to use GNU Make and run:

make test

You may want to use this method if you are working on rust_icu, have your development environment all set up and would like a shorthand to run the tests.

Docker-based

See optional dependencies section above.

To run a hermetic build and test of the rust_icu source code, issue the following command:

make docker-test

This will run docker-based build and test of the source code on your local machine. This is a good way to test that your code works with a specific reference version of ICU.

Prior art

There is plenty of prior art that has been considered:

The current state of things is that I'd like to do a few experiments on my own first, then see if the work can be folded into any of the above efforts.

See also:

Assumptions

There are a few competing approaches for ICU bindings. However, it seems, at least based on information available in rust's RFC repos, that the work on ICU support in rust is still ongoing.

These are the assumptions made in the making of this library:

  • We need a complete, reusable and painless ICU low-level library for rust.

    This, for example, means that we must rely on an external ICU library, and not lug the library itself with the binding code. Such modularity allows the end user of the library to use an ICU library of their choice, and incorporate it in their respective systems.

  • No ICU algorithms will be reimplemented as part of the work on this library.

    An ICU reimplementation will likely take thousands of engineer years to complete. For an API that is as subtle and complex as ICU, I think that it is probably a better return on investment to maintain a single central implementation.

    Also, the existence of this library doesn't prevent reimplementation. If someone else wants to try their hand at reimplementing ICU, that's fine too.

  • This library should serve as a low-level basis for a rust implementation.

    A low level ICU API may not be an appropriate seam for the end users. A rust-ful API should be layered on top of these bindings. It will probably be a good idea to subdivide that functionality into crates, to match the expectations of rust developers.

    I'll gladly reuse the logical subdivision already made in some of the above mentioned projects.

  • I'd like to explore ways to combine with existing implementations to build a complete ICU support for rust.

    Hopefully it will be possible to combine the good parts of all the rust bindings available today into a unified rust library. I am always available to discuss options.

    The only reason I started a separate effort instead of contributing to any of the projects listed in the "Prior Art" section is that I wanted to try what a generated library would look like in rust.

Additional instructions

Quickstart guide

Before you begin, please ensure the following prerequisites are met:

  • You have docker installed and it runs on your system.
  • You have GNU Make.
  • You have git.
  • You have plenty of disk space. The docker images for the build environment are a bit large, so a few GiB are needed to fit all of them.
  • You have an Internet connection.

From there, the following sequence of commands will check out, build and test the rust_icu source code.

mkdir -p ~/tmp
cd tmp
git clone https://github.com/google/rust_icu
cd rust_icu
make docker-test

You can now make changes to the code and tests. You can re-run the compile and test cycle by running make docker-test.

ICU installation instructions

These instructions follow the "out-of-tree" build instructions from the ICU repository.

Assumptions

The instructions below are not self-contained. They assume that:

  • you have your system set up such that you can follow the ICU build instructions effectively. This requires some upfront time investment.
  • you can build ICU from source, and your project has access to ICU source.
  • your setup is Linux, with some very specific settings that worked for me. You may be able to adapt them to work on yours.

Compilation

mkdir -p $HOME/local
mkdir -p $HOME/tmp
cd $HOME/tmp
git clone https://github.com/unicode-org/icu.git
mkdir icu4c-build
cd icu4c-build
../icu/icu4c/source/runConfigureICU Linux \
  --prefix=$HOME/local \
  --enable-static
make
make install
make doc

If the compilation finishes with success, the directory $HOME/local/bin will have the file icu-config which is necessary to discover the library configuration.

You can also do a

make check

to run the unit tests.

If you add $HOME/local/bin to $PATH, or move icu-config to a directory that is listed in your $PATH you should be all set to compile rust_icu.

ICU rebuilding instructions

If you change the configuration of the ICU library with an intention to rebuild the library from source you should probably add an intervening make clean command.

Since the ICU build is not hermetic, this ensures there are no remnants of the old compilation process sitting around in the build directory. You need to do this for example if you upgrade the major version of the ICU library. If you forget to do so, you may see unexpected errors while compiling ICU, or while linking or running your programs.

Compiling for a set version of ICU

Assumptions

  • You have selected the feature set [renaming,icu_version_in_env]o

OR:

  • You have manually verified that the compatibility matrix has a "Yes" for the ICU version and feature set you want to use.

The following is a tested example.

env PKG_CONFIG_PATH="$HOME/local/lib/pkgconfig" \
    LD_LIBRARY_PATH="$HOME/local/lib" \
    RUST_ICU_MAJOR_VERSION_NUMBER=65 \
        bash -c 'cargo test'

The following would be an as of yet untested example of compiling rust_icu against a preexisting ICU version 66.

env PKG_CONFIG_PATH="$HOME/local/lib/pkgconfig" \
    LD_LIBRARY_PATH="$HOME/local/lib" \
    RUST_ICU_MAJOR_VERSION_NUMBER=66 \
        bash -c 'cargo test'

Adding support for a new version of ICU.

In general, as long as icu-config approach is supported, it should be possible to generate the library wrappers for newer versions of the ICU library, assuming that the underlying C APIs do not diverge too much.

An approach that yielded easy support for ICU 65.1 consisted of the following steps. Below, $RUST_ICU_SOURCE_DIR is the directory where you extracted the ICU source code.

  • Download the new ICU version from source to $RUST_ICU_SOURCE_DIR.
  • Build the ICU library following for example the compilation steps above with the new version.
  • Get the file lib.rs from the output directory $RUST_ICU_SOURCE_DIR/target/debug/build/rust_icu_sys-..., rename it to lib_66.rs (if working with ICU version 66, otherwise append the version you are using).
  • Save the file to the directory $RUST_ICU_SOURCE_DIR/rust_icu_sys/bindgen, this is the directory that contains the pre-generated sources.

These files lib_XX.rs may need to be generated again if build.rs is changed to include more features.

Adding more bindings

When adding more ICU wrappers, make sure to do the following:

  • Check rust_icu_sys/build.rs and rust_icu_sys/bindgen/run_bindgen.sh to add appropriate lines into BINDGEN_SOURCE_MODULES, then BINDGEN_ALLOWLIST_FUNCTIONS and BINDGEN_ALLOWLIST_TYPES.

Testing with a specific feature set turned on

Here's an example of running a docker test on ICU 67, with features icu_version_in_env and renaming turned on instead of the default. Note that the parameters are mostly passed into the container that runs docker-test via environment variables.

make DOCKER_TEST_ENV=rust_icu_testenv-67 \
  RUST_ICU_MAJOR_VERSION_NUMBER=67 \
  DOCKER_TEST_CARGO_TEST_ARGS='--no-default-features --features icu_version_in_env,renaming' \
  docker-test

Some clarification:

  • The environment variable RUST_ICU_MAJOR_VERSION_NUMBER is used for the feature icu_version_in_env to instruct cargo to use the file rust_icu_sys/bindgen/lib_67.rs as a prebuilt bindgen source file instead of trying to generate one on the fly.
  • The environment variable DOCKER_TEST_CARGO_TEST_ARGS is used to pass the command line arguments to the cargo test which is used in the docker container. The environment is passed in verbatim to cargo test without quoting, so separate words in the environment end up being separate args to cargo test.
  • The environment variable DOCKER_TEST_ENV is the base name of the Docker container used to run the test in. The container rust_icu_testenv-67 is a container image that contains preinstalled environment with a compiled version of ICU 67.

Refreshing static bindgen files

Requires docker.

Run make static-bindgen periodically, to refresh the statically generated bindgen files (named lib_XX.rs, where XX is an ICU version, e.g. 67) in the directory rust_icu_sys/bindgen which are used when bindgen features are turned off.

Invoking this make target will modify the local checkout with the newer versions of the files lib_XX.rs. Make a pull request and check them in.

For more information on why this is needed, see the bindgen README.md.

rust_icu's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rust_icu's Issues

Find a way to avoid `LD_LIBRARY_PATH` when running tests

IIUC it is not possible to tell cargo to bake in the absolute path of a non-default library when compiling.

For example, one would do this in case of gcc:
https://stackoverflow.com/questions/8835108/how-to-specify-non-default-shared-library-path-in-gcc-linux-getting-error-whil (i.e. set -rpath=... to the correct value).

See here: rust-lang/cargo#5077

I tried to modify the resulting binaries with patchelf but that somehow didn't work either, but I didn't really dig deep enough to understand why. Filing this bug so I don't forget.

ICU 67.1 brings in new C fallback method in the API

Between 66.0.1 and 67.1 there was a change in how locale fallback works in the ICU4C API, which fixed a long-standing bug
with the library: https://unicode-org.atlassian.net/browse/ICU-20931

Now that the bug is fixed, our language matching tests no longer work, we should fix that.

---- tests::test_accept_language_exact_match stdout ----
thread 'tests::test_accept_language_exact_match' panicked at 'assertion failed: `(left == right)`
  left: `(Some(ULoc { repr: "es_MX" }), ULOC_ACCEPT_FALLBACK)`,
 right: `(Some(ULoc { repr: "ar_EG" }), ULOC_ACCEPT_VALID)`', rust_icu_uloc/src/lib.rs:774:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Make invalid feature combinations a compile error

Some feature combinations yield invalid project configuration.

These mostly fail, but fail with confusing error messages. This change causes
the compilation to fail with a clear error message in case one of those
combination is used by accident.

Use better interchange types than UCalendar for date formatting

ECMA-402 does not expose the Calendar class, in large part because we are working toward a better representation for dates and times in Temporal. I know pure ICU does not currently support any other good types for inputting to your DateFormat, but perhaps this is a place where you want to consider a wrapper with a more friendly API that hides the UCalendar under the hood.

rust_icu requires non-stable toolchain

README.md should note this, and explain how to set up the toolchain.

error[E0554]: `#![feature]` may not be used on the stable release channel
 --> rust_icu_sys/src/lib.rs:1:1

Feedback on rust_icu_uenum

This is basically a data-accessor API. This looks fine. We don't have iterators over time zones yet in ECMA-402, but I think that's something that would probably be acceptable. It's in the same vein of thinking as display names, which we have an open proposal to support now.

Make the rust_icu crates available on docs.rs

The rust_icu build process is very specific, requiring a preinstalled ICU library or a docker container. docs.rs does not know about this and tries to build with vanilla cargo. This fails, and as result no documentation appears on docs.rs.

Example here:
https://docs.rs/crate/rust_icu_uenum/0.0.4/builds/211223

An example 'sys' crate that works is sodium-sys. It has a custom build.rs script.
https://github.com/rustyhorde/sodium-sys/blob/master/build.rs

ECMA 402 for Rust.

The work is expanding on the initial proposal and has as goal to expand the coverage of the ECMA 402 implementation to all functionality currently exposed.

The initial status is given below. The goal is to check all boxes in the matrix.

ECMA 402 API Trait Definition rust_icu API rust_icu adaptor icu4x API icu4x adaptor
Intl.Collator
Intl.DateTimeFormat
Intl.DisplayNames
Intl.NumberFormat
Intl.Locale
Intl.ListFormat
Intl.PluralRules
Intl.RelativeTimeFormat

`make cov` requires ctags-exuberant

@kpozin noticed that the magic flags --c-kinds=fp used by make cov to extract function signatures from Unicode source doesn't exist in ctags versions other than ctags-exuberant. (yay!)

We should, perhaps, make a hermetic version of make cov that doesn't make this assumption.

`SIGSEGV` on `rust_icu_umsg` only on ICU 67.1

Repro:

env DOCKER_TEST_ENV=rust_icu_testenv-67 make docker-test
# See it crash, then repeat the run on ICU 67.1 with gdb and the binary, to obtain the back trace
env PKG_CONFIG_PATH=$HOME/local/lib/pkgconfig LD_LIBRARY_PATH=$HOME/local/lib bash -c 'gdb target/debug/deps/rust_icu_umsg-4cf9ad179042bf4d'

Backtrace:

[Switching to Thread 0x7ffff5a39700 (LWP 119163)]
0x00007ffff7b71870 in u_strlen_67 () from /home/fmil/local/lib/libicuuc.so.67
(gdb) bt
#0  0x00007ffff7b71870 in u_strlen_67 () from /home/fmil/local/lib/libicuuc.so.67
#1  0x00007ffff7b6bda2 in icu_67::UnicodeString::doAppend(char16_t const*, int, int) ()
   from /home/fmil/local/lib/libicuuc.so.67
#2  0x00007ffff7b6bd27 in icu_67::UnicodeString::UnicodeString(char16_t const*) ()
   from /home/fmil/local/lib/libicuuc.so.67
#3  0x00007ffff7da0e39 in umsg_vformat_67 () from /home/fmil/local/lib/libicui18n.so.67
#4  0x000055555556b48f in rust_icu_umsg::format_varargs::{{closure}} (va_list=...)
    at rust_icu_umsg/src/lib.rs:384
#5  0x000055555556c7d7 in core::ffi::VaListImpl::with_copy (self=0x7ffff5a37fe8, f=...)
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libcore/ffi.rs:345
#6  0x000055555556bd08 in format_varargs (fmt=0x7ffff5a383c8, args=...) at rust_icu_umsg/src/lib.rs:373
#7  0x000055555556a834 in rust_icu_umsg::tests::basic () at rust_icu_umsg/src/lib.rs:448
#8  0x000055555556c441 in rust_icu_umsg::tests::basic::{{closure}} () at rust_icu_umsg/src/lib.rs:434
#9  0x000055555556d4de in core::ops::function::FnOnce::call_once ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libcore/ops/function.rs:232
#10 0x00005555555962b6 in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/liballoc/boxed.rs:1034
#11 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/panic.rs:318
#12 std::panicking::try::do_call ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/panicking.rs:297
#13 std::panicking::try ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/panicking.rs:274
#14 std::panic::catch_unwind ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/panic.rs:394
#15 test::run_test_in_process () at src/libtest/lib.rs:541
#16 test::run_test::run_test_inner::{{closure}} () at src/libtest/lib.rs:450
#17 0x000055555556e146 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/sys_common/backtrace.rs:130
#18 0x0000555555573585 in std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/thread/mod.rs:475
#19 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/panic.rs:318
#20 std::panicking::try::do_call ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/panicking.rs:297
#21 std::panicking::try ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/panicking.rs:274
#22 std::panic::catch_unwind ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/panic.rs:394
#23 std::thread::Builder::spawn_unchecked::{{closure}} ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libstd/thread/mod.rs:474
#24 core::ops::function::FnOnce::call_once{{vtable-shim}} ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/libcore/ops/function.rs:232
#25 0x00005555555e6a1a in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/liballoc/boxed.rs:1034
#26 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once ()
    at /rustc/a08c47310c7d49cbdc5d7afb38408ba519967ecd/src/liballoc/boxed.rs:1034
#27 std::sys::unix::thread::Thread::new::thread_start () at src/libstd/sys/unix/thread.rs:87
#28 0x00007ffff7a4bfb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#29 0x00007ffff796119f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) 

Make `sudo make docker-test` work, too.

From @kpozin

$ sudo make docker-test
mkdir -p /tmp/rust_icu-root-target
docker run --tty --interactive \
                --user=0:0 \
                --volume=:/src/rust_icu \
                --volume=/tmp/rust_icu-root-target:/build/cargo \
                --volume=/root/.cargo:/usr/local/cargo \
                --env="CARGO_TEST_ARGS=" \
                filipfilmar/rust_icu_testenv-64:0.0.4
+ env
ICU_SOURCE_DIR=/src/icu
HOSTNAME=0b7d0ac807af
CARGO_TEST_ARGS=
RUST_ICU_SOURCE_DIR=/src/rust_icu
PWD=/src
ICU4C_BUILD_DIR=/build/icu4c-build
HOME=/root
CARGO_HOME=/usr/local/cargo
TERM=xterm
RUSTUP_HOME=/usr/local/rustup
SHLVL=1
RUST_VERSION=1.40.0
PATH=/usr/local/cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
CARGO_BUILD_DIR=/build/cargo
_=/usr/bin/env
+ cd /src/rust_icu
+ cargo install bindgen rustfmt
/entrypoint.sh: line 5: cargo: command not found
+ cd rust_icu_sys
/entrypoint.sh: line 7: cd: rust_icu_sys: No such file or directory
+ env LD_LIBRARY_PATH=/usr/local/lib cargo test
env: 'cargo': No such file or directory
+ cd rust_icu_common
/entrypoint.sh: line 11: cd: rust_icu_common: No such file or directory
+ env LD_LIBRARY_PATH=/usr/local/lib cargo test
env: 'cargo': No such file or directory
+ env LD_LIBRARY_PATH=/usr/local/lib cargo test
env: 'cargo': No such file or directory
make: *** [Makefile:38: docker-test] Error 127

Use Rust built-in UTF-16 conversion

Split from #6

The code currently calls u_strToUTF8. It would be better to use Rust standard library functions like std::str::encode_utf16. You would avoid having to go across the FFI boundary when performing the conversion.

Add a build feature to skip using `icu-config` altogether

This allows us to effectively bypass the build.rs invocation, in case the build environment is prepared differently.
For example, Fuchsia has its own custom configuration that does not depend on the system ICU library and should not be using icu-config.

Interop with unic_locale

Maybe you want to use @zbraniecki's Rust Locale class. 😃

It looks like all this does is basically call uloc_canonicalize. IIRC, Zibi's class performs partial but not full canonicalization yet. Once Zibi's class supports that, maybe you can just remove this wrapper altogether.

Remove pattern string constructor from rust_icu_udat

ECMA-402 does not and probably will not add support for parsing. The bottom line is that parsing of localized strings is a hard problem that is best solved by simply building your application to avoid having to do it. Do you really need to support parsing in the wrapper?

ECMA-402 also does and probably will not support patterns. ICU has long recommended that people don't use patterns; they should use skeletons instead. Do you really need to support patterns?

Speaking of skeletons, you don't have an API for skeletons right now. Could you remove your pattern API and replace it with a skeleton API?

Remove parsing from rust_icu_udat

Splitting off my comments about date parsing from #2.

ECMA-402 does not and probably will not add support for parsing. The bottom line is that parsing of localized strings is a hard problem that is best solved by simply building your application to avoid having to do it. Do you really need to support parsing in the wrapper?

Make `feature=icu_config` robust in face of version updates

When icu_config feature is turned off, there is no way for the code to detect the ICU library version in use, so at the moment it always defaults to 64. Which, of course, will be incorrect in the general case.

Add a way for the user to pass the desired ICU renaming version.

Figure out how to handle strings at the API boundary

Nothing wrong here from a functionality point of view, but it raises questions about what is the best way to represent strings when interfacing with ICU. Many ICU APIs want UTF-16 strings but also accept UTF-8 strings. I've been toying with the idea of making a Rust version of UnicodeString that you can toggle between UTF-8 and UTF-16 at compile time. For now, I think the safest thing is for users to keep their strings in Rust-standard UTF-8, and only do the UTF-16 conversion at the API boundary. Even if you have to round-trip a few times between 8 and 16, I strongly suspect that this won't be a performance bottleneck.

Release a new version of rust_icu

Two significant changes:

  • Feature testing stabilization
  • Feature build fix for rust_icu_uloc

I propose uprevving the minor, i.e. releasing 0.3.0.

What is the point of wrapping UText?

My question here is, where and how do you use the Text that you make? In ICU, a UText is primarily useful for interfacing with APIs like the BreakIterator and regex engine. It doesn't do a whole lot on its own, except maybe providing some more UTF-8 to UTF-16 conversions, but you already have that functionality in rust_icu_ustring.

Start testing `rust_icu` with ICU 67.1

ICU 67.1 has been released. We should start testing with it.

A specific quirk with ICU 67.1 is that it fixes the bugs in locale matching that ICU4C had prior to this release. See issue #59 for details. We used not to discriminate test results based on versions, but if
we want continuity of support, we must test all supported ICU versions. This means introducing features that allow us to run different versions of the same test depending on the version of ICU in use.

Now that we have #75, we can start setting up the build.

Amend `make publish` to support removing dev-dependencies

crates.io validates all dependencies, including dev-dependencies. If there's an apparent cycle caused by imports in tests, e.g. [ucal] -> [udat] -> [ucal dev], it becomes impossible to publish either crate to crates.io. (See rust-lang/cargo#4242.)

As a workaround, we could have make publish something along the lines of:

  1. Verify that the git workspace is clean.
  2. Back up all Cargo.toml files.
  3. Remove or comment out [dev-dependencies] from all Cargo.toml files.
  4. Run cargo publish --allow-dirty, allowing the modified workspace state.
  5. Restore the backed up Cargo.toml files.

Build failures without features in rust_icu_ucol & rust_icu_umsg

Repro:

cd rust_icu_ucol
cargo build --no-std-features
cd rust_icu_umsg
cargo build --no-std-features

Both libs refer to non-existing items from the sys crate.

Context: this affected Fuchsia builds using the "renaming" feature only (no default features).

Basic collation support

This is a very basic subset of issue #79: ucol_open, ucol_close, ucol_strcoll and ucol_strcollUTF8.

Can be done quickly and wonder whether it will get some use when it is available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.