Giter VIP home page Giter VIP logo

Comments (3)

mikevoronov avatar mikevoronov commented on August 28, 2024

C/C++/Rust to WASM compilation

In this section we discuss the current state of some public tools for translating C/C++/Rust to WebAssembly.

C/C++

There are two ways of translating C/C++ code to WebAssembly. The first one based on the regular compilation process in llvm-ecosystem: C/C++ is compiled into intermediate llvm representation and only then compiled to WASM. The first part of compilation could be done by frontend C/C++ compiler clang or DragonEgg (that replaced outdated gcc-llvm) and for the second part of this compilation process wasm-target llvm backend or binaryen s2wasm compiler could be used. So, this process could be represented as follows:

C/C++ -> (clang, DragonEgg) -> llvm -> (llvm wasm-target backend, binaryen s2wasm) -> wasm

However, for now, s2wasm isn't officially supported and has been removed from the official binaryen repository by this PR.

The second C/C++ to WASM compilation way is based on emscripten:

llvm -> (emscripten) -> javascript (asm.js) -> (binaryen asm2wasm) -> javascript + wasm

This way is commonly used for work in browsers, and since s2wasm is no more used, it is the general way of using emscripten and binaryen tools. However, all these ways are based on the standard libc and some its method became exported in the generated wasm module. For example after compilation this example

#include <vector>

int main() {
  std::vector<int> vec;
  vec.push_back(1);
  return 0;
}

by emscripten without any flags two files would be generated: js and wasm. The wasm part contains such imports:

  (import "env" "memory" (memory (;0;) 256 256))
  (import "env" "table" (table (;0;) 192 192 anyfunc))
  (import "env" "memoryBase" (global (;0;) i32))
  (import "env" "tableBase" (global (;1;) i32))
  (import "env" "DYNAMICTOP_PTR" (global (;2;) i32))
  (import "env" "tempDoublePtr" (global (;3;) i32))
  (import "env" "STACKTOP" (global (;4;) i32))
  (import "env" "STACK_MAX" (global (;5;) i32))
  (import "global" "NaN" (global (;6;) f64))
  (import "global" "Infinity" (global (;7;) f64))
  (import "env" "enlargeMemory" (func (;0;) (type 7)))
  (import "env" "getTotalMemory" (func (;1;) (type 7)))
  (import "env" "abortOnCannotGrowMemory" (func (;2;) (type 7)))
  (import "env" "abortStackOverflow" (func (;3;) (type 6)))
  (import "env" "nullFunc_ii" (func (;4;) (type 6)))
  (import "env" "nullFunc_iiii" (func (;5;) (type 6)))
  (import "env" "nullFunc_v" (func (;6;) (type 6)))
  (import "env" "nullFunc_vi" (func (;7;) (type 6)))
  (import "env" "nullFunc_viiii" (func (;8;) (type 6)))
  (import "env" "nullFunc_viiiii" (func (;9;) (type 6)))
  (import "env" "nullFunc_viiiiii" (func (;10;) (type 6)))
  (import "env" "___cxa_allocate_exception" (func (;11;) (type 2)))
  (import "env" "___cxa_begin_catch" (func (;12;) (type 2)))
  (import "env" "___cxa_throw" (func (;13;) (type 8)))
  (import "env" "___lock" (func (;14;) (type 6)))
  (import "env" "___setErrNo" (func (;15;) (type 6)))
  (import "env" "___syscall140" (func (;16;) (type 9)))
  (import "env" "___syscall146" (func (;17;) (type 9)))
  (import "env" "___syscall54" (func (;18;) (type 9)))
  (import "env" "___syscall6" (func (;19;) (type 9)))
  (import "env" "___unlock" (func (;20;) (type 6)))
  (import "env" "_abort" (func (;21;) (type 1)))
  (import "env" "_emscripten_memcpy_big" (func (;22;) (type 0)))
  (import "env" "_pthread_getspecific" (func (;23;) (type 2)))
  (import "env" "_pthread_key_create" (func (;24;) (type 9)))
  (import "env" "_pthread_once" (func (;25;) (type 9)))
  (import "env" "_pthread_setspecific" (func (;26;) (type 9)))

JS part in its turn would have correspondings exports. The general scheme of interaction between these modules looks approximately as follows:

  1. js part initializes heap that can be used by wasm as dynamic memory;
  2. js part exports all necessary for wasm part functions (the exported function list contains libc function depends on module type: MAIN_MODULE or SIDE_MODULE);
  3. js part loads wasm part and links imports with corresponding exports;
  4. js part call start function of wasm part;
  5. js part manages all allocation and deallocation during wasm part execution. Also js part manages all OS-specific WASM part functionalities like network and file system interaction.

The approach based on clang + llvm-to-wasm target generates imports by libs in its sysroot parameter. There is a wasmception project which goal is to decrease import count and imports only OS syscalls. For example after compilation example with std::vector following imports would be generated:

 (import "env" "__syscall1" (func $__syscall1 (type 4)))
 (import "env" "__syscall0" (func $__syscall0 (type 3)))
 (import "env" "__syscall3" (func $__syscall3 (type 5)))
 (import "env" "__syscall6" (func $__syscall6 (type 10)))
 (import "env" "__syscall5" (func $__syscall5 (type 11)))
 (import "env" "__syscall2" (func $__syscall2 (type 2)))
 (import "env" "__syscall4" (func $__syscall4 (type 12))) 

These imports correspond to memory managements syscalls brk/sbrk/mmap of standard dlmalloc/ptmalloc2 allocator in libc. These allocators (ptmalloc is a multi-threaded analog of dlmalloc) uses mmap for allocation size bigger than some threshold (256 Kb by default) and brk/sbrk for break area (break area is an area that contains all common allocator structures and allocation that size less than mmap threshold) increasing.
In result, it could be said that there aren't any in the box tools for compile C/C++ code to "static" WASM module. However, there are a few ways for compilation C/C++ in the context of our ecosystem:

  1. Implement all functions imported by the wasm part in asmble generated Java-module as exported to WASM module functions. Moreover it necessary to implement only exported API for memory management because now it is assumed that user can do only some "calculating" single-threading tasks which don't depend on network or file OS system.
  2. There are only two instructions in WASM to work with dynamic memory: memory.size (that returns current size of the heap) and memory.grow (that increases the size of the heap on given value). This instructions set is ideal for implementing break area of dlmalloc/ptmalloc. But for mmap emulation in the context of WASM, it is needed to preallocate all the memory (4 Gb that currently supports by 32bit WASM specification) and then use two memory pointers. The first one represents the current top chunk limit and the second one represents the mmap limit. Then the second one must initially point to the right border of the preallocated memory area (4 Gb) and decrease by the size of requested by mmap syscall and the first one otherwise must point to the left border of preallocated memory and increase by the size of requested by brk/sbrk syscalls. Practically this approach should be done by augmented musl with clang intristic to memory.grow (__builtin_wasm_mem_size/__builtin_wasm_mem_grow) instead of memory management syscalls.

However, the first way couldn't be used in our conception because of it impossible to verify state by verification game. This is because by the first approach WASM code highly relies on "external" memory management functions and results of this functions couldn't take into account by calculating a hash of user-supplied wasm code (the hash could be used for check integrity of WASM code).

Rust

Unlike C/C++ to wasm compilation pitfalls with memory management syscalls Rust backend already use translated to wasm dlmalloc augmented with memory.grow intristics. However, there is a special wee_alloc allocator suitable for wasm that could be used by our client or us for decrease resulting wasm code size.

from nox.

alari avatar alari commented on August 28, 2024

Does it seem to be fixed?

from nox.

mikevoronov avatar mikevoronov commented on August 28, 2024

Yes, besides of described ways there is a new one based on wasi-sdk.

from nox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.