Giter VIP home page Giter VIP logo

wasmify-sophia's Introduction

Wasm-ify Sophia

This repository propose an architecture to export to Javascript classes that implements the Dataset API of the Sophia tool kit.

The Rust code is compiled to Web Assembly and the resulting Javascript class is (almost) RDF.JS Dataset compliant.

It also provides some new custom Sophia Dataset implementation.

sophia_wasm

The crate sophia_wasm provides an exportation of terms and quads from Sophia using wasm_bindgen and wasm-pack. It also provides basic implementation of exportation of datasets into Javascript, that tries to be compliant with the RDF.JS specification.

Macro and tools are also provided to export to Javascript other implementations of the Dataset trait, as long as the exported dataset implements Default and MutableDataset.

The exported datasets can be used out-of-the-box, which tries to implement, but does not adhere completely, to the RDF.JS Dataset specification.

A wrapper class is provided to address some of the issues of the defaultly exported structures (memory leaks and some details to be compliant like being able to chain the calls to the add method).

bjdatasets

The crate bjdatasets exposes some implementation of the Sophia Dataset trait.

  • TreeDataset, a dataset that resorts on multiple trees. By storing quad in different orders, it provides efficient quad research (see identifier-tree)
  • FullIndexDataset, a dataset that stores for every possible pattern every corresponding quad
  • VecOrDataset<D>, a dataset that can use either a vector of quads or another Dataset structure

identifier-forest

The crate identifier-forest provides a forest structure able to store quads in the form of 4 identifiers (that can be mapped to actual terms using an external library).

The main features of the forest are :

  • One tree is built on creation.
  • Up to 5 over trees can be spawned to store the identifier quads in different orders
  • The 6 trees provides optimal pattern maching for all kind of patterns SPOG.
  • While the current context is RDF Dataset heavy, it may be possible to be used in other context.

identifier-forest is used both as the base structure of :

  • WasmTree, another repository which implements the RDF.JS specification using Web Assembly but without resorting to Sophia.
  • The TreeDataset implementation in the bjdatasets crate.

Build

Required

Run tests

Rust : TODO

Javascript :

  • cd sophia-wasm
  • npm install
  • ./run_server.sh test

Build for Web Assembly / Javascript

  • cd sophia-wasm
  • npm install
  • ./buildpkg.py (builds for both browsers and node)
  • cd pkg
  • wasm-pack pack

*TODO : Rename or get rid of run_server (it doesn't actually run a server)

Issue

Currently, WasmTree is faster than the tested exportations. So if you want to just use Web Assembly to improve the performances of your Javascript application, you should consider using it instead.

License and funding

This work is distributed under the MIT License.

This project has been funded by the REPID Project during my internship in the TWEAK team at LIRIS.

wasmify-sophia's People

Contributors

bruju avatar pchampin avatar

Stargazers

Price Smith avatar FredHay avatar Shawn avatar Johan avatar Martin Larralde avatar  avatar Thomas Bourg avatar

Watchers

James Cloos avatar  avatar

Forkers

pchampin

wasmify-sophia's Issues

new_anti is badly designed

https://github.com/BruJu/Portable-Reasoning-in-Web-Assembly/blob/531e17b21383fcffd5bb2d4f92f39ebee2cbe195/bjdatasets/src/treeddataset.rs#L42

In intent, new_anti builds a tree that prioritizes the indexes that are the inverse of the one used by the RDFJS match function.

The purpose is to fill the produced tree in "random" order to avoid having a tree of height number of elements / 2B.

The current design fails to do so because quads sorted in SPOG order, filled in a OGSP tree, where every S and P are the same produces the same tree as in a SPOG tree (the important thing is to not sort by O then G but by G then O)

Get Rust Ptr function with random parameter

Currently to detect if an object has been generated by us, we ask an attribute to give us a pointer.

If it does return nullptr, it means that we didn't generate the object.

But another library or object could implement the attribute, which means that this is a very unsafe method to detect our items.

  • To limit the risks, PA Champin (author of Sophia) proposed to make GetRustPtr a function that requires a number randomly generated. If the function doesn't return undefined and the number is the same that the expected one, then we have better guarantees that we are the owner of the object.
  • Or just find a way to properly implement the detection (which doesn't seem possible right now)

RDF JS Compliance : DatasetCore can be iterated on to return quads

https://rdf.js.org/dataset-spec/#datasetcore-interface specifies that we should be able to iterate on a DatasetCore to gets the quads.

This is currently not possible for two reasons :

Lazy datasetcore implementation

While implementing a version of n3js that uses sophia as a Datasetcore instead of n3's store in the sophia_benchmark intrastructure ( https://github.com/pchampin/sophia_benchmark ), I realized that any fully RDF.JS compliant use would have a bad time. (sure the use of n3 made on the benchmark is not RDF JS compliant either, but it would be great to have better execution time while matching the rdf.js specification)

The proposed test is to match the triples (we use quads but it is the same) with a given predicate and object, and measure the time to return the first and all the triples that match.

The basic algorithm would be to first match the predicate-object and then iterate with forEach on each returned quad.

As the current implementation is not lazy, the first forEach iteration will be delayed by the match.

A proposition that often came on the table by my advisor was to do lazy operations, and this benchmark emphasizes the need to implement it.

A basic implementation of lazy would be

pub enum SophiaExportDatasetCore {
    Owned(Rc<RefCell<FastDataSet>>),
    Borrowed(Rc<(RefCell<)FastDataSet(>)>, Mutation>
}

// we can't export enums with wasm_bindgen
pub struct SophiaExportDatasetCore {
     v: SophiaExportDatasetCoreEnum
}

A more advanced implementation would be

pub struct SophiaExportDatasetCore {
    Owned: Rc<RefCell<FastDataSet>>,
    Borrowed: (Rc<(RefCell<)FastDataSet(>)>, Mutation>
}

Lazyness analysis of RDF.JS functions :

  • Functions that requires to iterate on the whole dataset : size, iterate, forEach, reduce, toArray, toCanonical, toString

  • Functions that can lazily evaluate the dataset : contains, has, equals, every, some

  • Functions that requires to copy the dataset if someone is lazily using it : add, addAll

  • Functions that both requires to copy the dataset if someone is lazily using it and to fully evaluate the dataset if lazily evaluated : delete, deleteMatches

  • Functions that creates a lazily evaluated graph : match

  • Functions that can creates a lazily evaluated graph is the passed function is copied : filter, map

  • Ensemblist functions : Difference, intersection, union

  • Not implemented : import, toStream

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.