Giter VIP home page Giter VIP logo

ion-rust's People

Contributors

almann avatar desaikd avatar infinite-blue-1042 avatar jnicholls avatar jobarr-amzn avatar jpeddicord avatar jpschorr avatar marcbowes avatar nirosys avatar plasmaintec avatar popematt avatar rgantt avatar rmarrowstone avatar sadderchris avatar therapon avatar tommy avatar zslayton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ion-rust's Issues

AArch64 Continuous Integration Pipeline

#102 illustrates that we should have some kind of CI for AArch64 if we want to ensure that things build/run on this platform. A quick survey on Github Actions doesn't make it clear that it is an option here, but regardless, we should do something for ARM.

SymbolTable Resolver/Catalog

In other implementations we have either some kind of catalog or resolver API that provides a way for the implementation to call user code to get shared symbol tables. We should have an appropriately similar trait API in ion-rust and expose it to the parser APIs appropriately.

Make the 'D' in Cursor<D> an associated type

The Cursor trait is generic over some type D that implements IonDataSource.

pub trait Cursor<D: IonDataSource> {
    // ...
}

However, defining the trait as Cursor<D> means that all types that are generic over its implementations must also be generic over D.

impl<D: IonDataSource, C: Cursor<D>> Reader<D, C> {
  //  ^-- D is now in Reader's signature ---^
  // ...  
}

If we make D an associated type instead...

pub trait Cursor {
    type DataSource: IonDataSource;
    // ...
}

the type signatures of "downstream" types can be simplified quite a bit:

pub struct Reader<C: Cursor> {
    // ...
}

[ion-c-sys] Consider Making IonCAnnotationsFieldWriter Have a Stronger Contract

Currently IonCAnnotationsFieldWriter has a method to support generic composition of annotations and field writing that looks something like:

    /// Writes a value within a context of annotations and/or a field name
    ///
    /// Note that it is undefined behavior if a value writing method is **not** called
    /// or if a value writing method is called **more than once**.  For this reason,
    /// most users should prefer the `field()` and/or `annotations()` method on the
    /// [`IonCWriter`](./trait.IonCWriter.html) trait as it doesn't have this edge case.
    fn write_annotations_and_field<'a, A, F, FN>(
        &mut self,
        annotations: A,
        field: F,
        applier: FN,
    ) -> IonCResult<()>
    where
        A: Into<Option<&'a [&'a str]>>,
        F: Into<Option<&'a str>>,
        FN: Fn(&mut Self::AFValueWriter) -> IonCResult<()>;

@zslayton mentioned in #95 that he thought this might be unsafe, but it is not a memory safety concern. It should be possible to make this API less problematic, but it requires more abstraction/implementation with respect to the host implementation (e.g. only write the fields when actually using the value writer). I didn't think it was worth it to do at this time, because if you use the context object, it handles the usage safety for you.

Add ion-c-sys support for ION_INT

We are currently limited to i64 integers in the bindings, we should have some idiomatic integration for ION_INT which is Ion C's arbitrary integer type.

Support for arbitrarily-sized VarInt, VarUint

VarInt and VarUInt currently store the value being read in as a 64-bit value. This is likely to be sufficient for almost all use cases, but the Ion spec allows for values of any size. If/when we have demand for it, we should investigate using an implementation of big-integer.

Int and VarInt lose sign information when reading -0

Int and VarInt represent the value of the signed integer being read in as an i64, which cannot represent -0 on its own. Int and VarInt are already structs, so we can add a sign field to these and check for it when necessary.

Reconcile first_commit_candidate Branch with master

The first_commit_candidate branch should be reconciled with master. It seems to be further along than master, I suspect we should use it as a first cut for what we want to do in master, but we should remove this branch (or merge it) once we're done with it.

@zslayton is probably best to comment here, but I think it is good to have an issue to track this.

[ion-c-sys] Adds lib64 Path to Build Script

On RHEL/CentOS/Amazon Linux the install target on x86-64 installs to lib64 path, on other platforms (including Windows and macOS) this is lib, we need to add lib64 to the build script.

This can be reproduced with the following docker image with a checkout of the repository.

# Build:
# docker build -t ion-rust-al2 --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g) $(pwd)
#
# Run:
# docker run --interactive --tty --rm --mount type=bind,source="$(pwd)",target=/workspace ion-rust-al2

FROM amazonlinux:2

ARG USER_ID
ARG GROUP_ID
ARG WORKSPACE_DIR=/workspace

RUN yum -y install clang llvm-devel git gcc gcc-c++ wget curl tar make
RUN wget --quiet  https://github.com/Kitware/CMake/releases/download/v3.18.3/cmake-3.18.3-Linux-x86_64.sh
RUN sh ./cmake-3.18.3-Linux-x86_64.sh --skip-license --prefix=/usr/local

RUN groupadd --gid $GROUP_ID user
RUN useradd -l --uid $USER_ID --gid $GROUP_ID user
RUN mkdir $WORKSPACE_DIR && chown $USER_ID.$GROUP_ID $WORKSPACE_DIR

USER user
WORKDIR $WORKSPACE_DIR

RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | /bin/sh -s -- -y

ENV PATH=/home/user/.cargo/bin:$PATH

CMD cargo clean && cargo build

[ion-c-sys] ION_DECIMAL Assignment from BigDecimal Breaks Ion C Encapsulation

There appears to be an issue with ion_decimal_from_int that requires a specifically allocated ION_DECIMAL. Since this is abstracted away from the public API.

Specifically, the problem appears to be ion_decimal.c L287-L293:

iERR ion_decimal_from_ion_int(ION_DECIMAL *value, decContext *context, ION_INT *p_int) {
    ION_DECIMAL_SWITCH (
        value,
        IONCHECK(ion_int_to_decimal(p_int, quad_value, context)),
        IONCHECK(_ion_int_to_decimal_number(p_int, num_value, context))
    );
}

As a result, our bindings do manual initialization and use internal Ion C APIs. Once this is fixed upstream, we should eliminate this hack.

Promote the header byte jump table to a constant

The create_header_byte_jump_table function in binary/header.rs will parse each possible byte value from 0 to 255 as though it were the one-octet type descriptor found at the beginning of all binary Ion values. The output from parsing each byte is stored in a newly allocated Vec that can be used as a jump table to avoid re-calculating the meaning of same byte values repeatedly.

This is sub-optimal:

  • It requires a Vec to be allocated when a fixed-size array would do
  • It requires entities that wish to use the jump table to maintain their own copy of the Vec despite the fact that its contents will always be identical.

Ideally, we would store a fixed-length array as a static constant in the header module. Unlike C, however, the Rust compiler tries very hard to prevent programs from mutating global state. This means that initializing a static array is currently somewhat difficult to do without sacrificing either speed or safety.

Here are some options I've explored:

safe + slow

The lazy_static crate provides a macro that allows you to declare and initialize a global constant. It works by wrapping your constant in a custom smartpointer type--the first time the pointer is dereferenced, lazy_static will call a user-defined function to calculate the value of the constant. Every dereference that occurs from then on will receive the already-calculated value.

This is easy to implement, but profiling with valgrind has shown that the indirection introduced by the smartpointer indirection adds an unacceptable amount of overhead in terms of CPU time. Derefencing the jump table each time you begin parsing a value in a stream with millions of values adds up quickly.

thread_local copies of the jump table suffer from even worse indirection overhead, rendering them a non-starter.

If/when we add an IonSystem concept, we may be able to have IonReaders share a reference to a shared jump table living in the IonSystem, but this may not be worth the complexity.

unsafe + fast

Initializing a fixed-size array (i.e. [T], not a Vec) is currently surprisingly challenging for non-trivial types. The best methods available only work for types that implement the Copy trait, and types that implement Default and where the array is <32 entries long.

We could choose to skirt this problem with unsafe code, but this has its own problems:

  • The compiler cannot guarantee the memory safety of unsafe code, so we would need to scrutinize it very carefully.
  • Because the array would be initialized at runtime (not build time), we would need to design an appropriate synchronization mechanism to guarantee that readers didn't reference the jump table before it was initialized.
  • The APIs for working with not-yet-initialized data are in a state of flux. The Rust ecosystem is migrating away from mem::uninitialized (currently deprecated in nightly) and towards mem::MaybeUninit (currently unstable).

safe + fast + not possible yet

Rust is incrementally adding support for const fns, functions whose outputs can be calculated entirely at compile time. This is perfect for our use case -- all of the inputs are statically known (u8 values from 0 to 255) and processing them requires no information from the outside world. With this feature, we'll be able to simply write:

const HEADER_JUMP_TABLE: [IonResult<Option<IonValueHeader>>] = create_jump_table();

The compiler will calculate the definition of HEADER_JUMP_TABLE at build time and then all interested entities can refer to it freely -- no wasted memory, no additional overhead from indirection.

At the time of this writing, const fns have severe restrictions on which language features may be used. Functionality like match statements, the ? operator, calling non-const functions, looping, etc. are planned but not yet supported. These restrictions are so limiting that I have not been able to write an array initialization const fn.

Develop orientation questions

Hi, guys, one of my friends who work for Amazon recommend me ION recently. Then I find this repo.

I am not sure if my understanding is correct or not. Looks like this repo only has a binary ION reader. Do you guys wanna start text format parser after finish binary reader first, or don't have plans to do it?

After reading the master branch, looks like result.rs, data_source.rs, and cursor.rs define traits, I guess you guys wanna make text in src and let it use types from types mod and implemented all traits. If so, I can help to make some code.

Sorry for bothering you guys in issues, I have searched 'help wanted' but found nothing.

Thank you.

[ion-c-sys] Convert BigDecimal -> ION_DECIMAL Inexact

Cases around ~35 digits or so, convert to ION_DECIMAL with a loss of precision, which is likely due to decQuad limitations, we should probably check the status code and do decNumber conversion or be more aggressive with usage of decNumber.

[ion-c-sys] Generic Associated Traits for Reader Writer Traits

This is generally applicable to ion-rs as well, but as we design the traits we want to have for our APIs, we should be consider how Generic Associated Traits (GAT) can be used to make our APIs reasonably flexible while maintaining static polymorphism.

Currently, delegation via our current APIs are difficult because we cannot have an associated type that can be constructed with a per invocation lifetime. This example illustrates this, and shows how we can work around this by the judicious use of unsafe code, by essentially emulating the reference with pointers we know to be scoped appropriately. We also work around the lack of GAT by simplifying the value writer used in the implementation of IonCAnnotationsFieldWriter for IonCWriterHandle by refering to Self which works, but makes the API potentially easier to misuse.

Add checkpoint/restore API

The BinaryCursor should be able to save its state and rewind to that position as long as the data source implements io::Seek.

Add Easier APIs for Using Ion C

With #34, using the Ion C bindings is pretty low-level. In particular, one needs to deal with iERR and also allocate and bind raw pointers. It would be nice if we could expose safer versions of the low-level APIs that integrate with Result, make working with ION_TYPE/ION_READER/ION_WRITER/etc. easier.

This would not be a replacement for the traits we have in the native Rust implementation, but at least make interacting with IonC in its full capacity easier.

[ion-c-sys] Support for Decimal Signed Zero in High-Level APIs

bigdecimal::BigDecimal relies on num_bigint::BigInt to provide the signed coefficient. BigInt does not support maintaining sign when constructing it (bigint.rs L2774-2786). This information is lost when converting from ION_DECIMAL to BigDecimal, so we should have some kind of API to preserve this if so desired.

One can always use ION_DECIMAL and IonDecimalPtr to get at the low-level Ion C APIs, but they shouldn't have to.

[ion-c-sys] Errors Converting Large BigDecimal -> ION_DECIMAL

For cases that are larger than decQuad there are issues using ion_decimal_from_ion_int to convert the coefficient creating a status error of invalid context leading to generating a nan decNumber. Talking with @tgregg, I believe this is a bug in Ion C, and needs to be reproduced there and fixed upstream.

Integrate Ion C Bindings to Ion Rust APIs

Once we have #34 and #37, we should bind ion-c-sys to ion-rust traits properly. This would enable us to fill in Ion Rust while we don't have all of the functionality present and would serve as an implementation target to cross-test against as well.

[ion-c-sys] AArch64 build problems

On a lark, I tried building ion-rs and ion-c-sys on Termux. However, I ran into the following build problems:

image

The problem is that certain Ion C APIs use c_char pointers, and we hard coded them to i8 pointers which works on x86-64, but is u8 pointers on AArch64. We should use the c_char type alias as it is defined appropriately on the platform being built.

[ion-c-sys] Extract Reader/Writer Traits

We should extract the Reader/Writer traits from IonCReaderHandle and IonCWriterHandle. This will make generic/dynamic dispatched code against this API possible, to potentially layer things like Ion Hash (assuming we don't just layer it on ion-rs).

IonCWriterHandle Lifetime Safe Field APIs

The field writing API has a requirement that the ION_STRING passed to them must live until a value is written. The trivial API surface makes this contract implicit:

let mut buf = vec![0; 128];
let len = {
    let mut writer = IonCWriterHandle::new_buf_mode(buf.as_mut_slice(), WriterMode::Binary)?;
    writer.start_container(ION_TYPE_STRUCT)?;
    {
        // no problem! 'static life for "name"
        writer.set_field_name("name")?;
        writer.write_string("kumo")?;
    }
    writer.finish_container()?;
    writer.finish()?
}

An example where this breaks is here:

let mut buf = vec![0; 128];
let len = {
    let mut writer = IonCWriterHandle::new_buf_mode(buf.as_mut_slice(), WriterMode::Binary)?;
    writer.start_container(ION_TYPE_STRUCT)?;
    {
        {
            let s = "name".to_string();
            writer.set_field_name(s.as_str())?;
        }
        writer.write_string("kumo")?;
    }
    writer.finish_container()?;
    writer.finish()?
}

In practice, users are probably not going to shoot themselves in the foot by separating the value writing calls from the field setting, but it would be nice to take away the foot gun.

Symbol Token and Symbol Table API

We need an API that exposes the concept of symbol tokens and tables to users. It is probably worth looking at amazon-ion/ion-go#11 as an example of a modern take on this in another implementation, though of course, we should make this idiomatic to Rust as appropriate.

Add IonC Facades for ION_STREAM and Files

Currently the IonCReaderHandle and IonCWriterHandle have constructors from buffers in memory (&mut [u8]). For full generality, we should support the ION_STREAM interface as well as the file target.

Add Low-level Ion C Bindings

While it is preferable to have pure rust implementation of Ion, it would be nice to be able to leverage (and compare) the amzn/ion-c implementation in Rust. We could add a sub-crate for ion-c-sys, have the crate build ion-c through the cmake crate and leverage the bindgen crate to expose low-level bindings.

It would be left as an exercise for future work to:

  • Add comprehensive tests for these bindings.
  • Add more idiomatic Rust APIs to make working with the C API less obnoxious (e.g. dealing with iERR as a Result).
  • Interface with the Ion Rust public API traits.

Add more context to IonError's variants

The IonError enum's variants currently only offer a text description of the problem that arose, meaning that there's no way to programmatically respond to failures. IonError::IoError and IonError::DecodingError should have additional fields containing helpful contextual data. e.g. The std::io::Error that was encountered or the byte offset where the reader found an illegal value.

Segfault when running codecoverage with tarpaulin

  • On my machine
    1. running codecoverage for non-doctest succeeds with both 1 thread or multiple
    2. running codecoverage for doctest fails with or without the thread limitation (See below for details)
  • On GitHub any codecoverage configuration fails with a segmentation fault.

I am not sure where the problem is exactly.
Note that

  • doctest coverage for tarpaulin depends on rust-nightly.
  • ion-c might have a bug

Running tarpaulin with

cargo tarpaulin -v --debug --release --run-types Doctests --out Html --workspace

or

cargo tarpaulin -v --debug --release --run-types Doctests --out Html --workspace -- --test-threads=1

fails on

Doc-tests ion-c-sys
     Running `rustdoc --edition=2018 --crate-type lib --test /home/ANT.AMAZON.COM/therapon/Amazon/Work/GitHub/Ion/ts-ion-rust-1/ion-c-sys/src/lib.rs --crate-name ion_c_sys -L dependency=/home/ANT.AMAZON.COM/therapon/Amazon/Work/GitHub/Ion/ts-ion-rust-1/target/release/deps -L dependency=/home/ANT.AMAZON.COM/therapon/Amazon/Work/GitHub/Ion/ts-ion-rust-1/target/release/deps -L native=/home/ANT.AMAZON.COM/therapon/Amazon/Work/GitHub/Ion/ts-ion-rust-1/target/release/build/ion-c-sys-30c937fa6fac9e04/out/lib -L native=/home/ANT.AMAZON.COM/therapon/Amazon/Work/GitHub/Ion/ts-ion-rust-1/target/release/build/libloading-35f0411caee9e2f5/out --test-args --test-threads=1 --extern ion_c_sys=/home/ANT.AMAZON.COM/therapon/Amazon/Work/GitHub/Ion/ts-ion-rust-1/target/release/deps/libion_c_sys-c0e9a51780e178af.rlib --extern paste=/home/ANT.AMAZON.COM/therapon/Amazon/Work/GitHub/Ion/ts-ion-rust-1/target/release/deps/libpaste-07a2659eec7ed2cb.so --extern rstest=/home/ANT.AMAZON.COM/therapon/Amazon/Work/GitHub/Ion/ts-ion-rust-1/target/release/deps/librstest-f059a51b34ab8d79.so -C relocation-model=dynamic-no-pic -C link-dead-code -C debuginfo=2 --cfg=tarpaulin --persist-doctests /home/ANT.AMAZON.COM/therapon/Amazon/Work/GitHub/Ion/ts-ion-rust-1/target/doctests -Z unstable-options`
error: test failed, to rerun pass '--doc'
[ERROR tarpaulin] Building doctests failed
[ERROR tarpaulin] Cargo failed to run! Error: Building doctest failed
[INFO tarpaulin] Serializing tarpaulin debug log to tarpaulin-run-2020-08-10T22:01:17.635435492-07:00.json
Error: "Cargo failed to run! Error: Building doctest failed"

The tarpaulin debug log is not that informative

{"events":[{"ConfigLaunch":"<anonymous>"}]}

Add Better Testing for Ion C Bindings

As per #34, we should have better testing for Ion C bindings. Running some tests against the test vectors at a minimum, and really refactoring that against what we have in good.rs. In addition, we should exercise the bindings in a reasonable way in addition to the high-level test vectors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.