Giter VIP home page Giter VIP logo

rv32emu's Introduction

RISC-V RV32I[MAFC] emulator

GitHub Actions

                       /--===============------\
      ______     __    | |⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺|     |
     |  _ \ \   / /    | |               |     |
     | |_) \ \ / /     | |   Emulator!   |     |
     |  _ < \ V /      | |               |     |
     |_| \_\ \_/       | |_______________|     |
      _________        |                   ::::|
     |___ /___ \       '======================='
       |_ \ __) |      //-'-'-'-'-'-'-'-'-'-'-\\
      ___) / __/      //_'_'_'_'_'_'_'_'_'_'_'_\\
     |____/_____|     [-------------------------]

rv32emu is an emulator for the 32 bit RISC-V processor model (RV32), faithfully implementing the RISC-V instruction set architecture (ISA). It serves as an exercise in modeling a modern RISC-based processor, demonstrating the device's operations without the complexities of a hardware implementation. The code is designed to be accessible and expandable, making it an ideal educational tool and starting point for customization. It is primarily written in C99, with a focus on efficiency and readability.

Features:

  • Fast interpreter for executing the RV32 ISA
  • Comprehensive support for RV32I and M, A, F, C extensions
  • Memory-efficient design
  • Built-in ELF loader
  • Implementation of commonly used newlib system calls
  • Experimental SDL-based display/event/audio system calls for running video games
  • Support for remote GDB debugging
  • Experimental JIT compiler for performance boost while maintaining a small footprint

Build and Verify

rv32emu relies on certain third-party packages for full functionality and access to all its features. To ensure proper operation, the target system should have the SDL2 library and SDL2_Mixer library installed.

  • macOS: brew install sdl2 sdl2_mixer
  • Ubuntu Linux / Debian: sudo apt install libsdl2-dev libsdl2-mixer-dev

Build the emulator.

$ make

Run sample RV32I[M] programs:

$ make check

Run Doom, the classical video game, via rv32emu:

$ make doom

The build script will then download data file for Doom automatically. When Doom is loaded and run, an SDL2-based window ought to appear.

If RV32F support is enabled (turned on by default), Quake demo program can be launched via:

$ make quake

The usage and limitations of Doom and Quake demo are listed in docs/demo.md.

Docker image

The image containing all the necessary tools for development and testing can be executed by docker run -it sysprog21/rv32emu:latest. It works for both x86-64 and aarch64 (Apple's M1 chip) machines.

Customization

rv32emu is configurable, and you can override the below variable(s) to fit your expectations:

  • ENABLE_EXT_M: Standard Extension for Integer Multiplication and Division
  • ENABLE_EXT_A: Standard Extension for Atomic Instructions
  • ENABLE_EXT_F: Standard Extension for Single-Precision Floating Point Instructions
  • ENABLE_EXT_C: Standard Extension for Compressed Instructions (RV32C.D excluded)
  • ENABLE_Zicsr: Control and Status Register (CSR)
  • ENABLE_Zifencei: Instruction-Fetch Fence
  • ENABLE_GDBSTUB : GDB remote debugging support
  • ENABLE_SDL : Experimental Display and Event System Calls
  • ENABLE_JIT : Experimental JIT compiler

e.g., run make ENABLE_EXT_F=0 for the build without floating-point support.

Alternatively, configure the above items in advance by executing make config and specifying them in a configuration file. Subsequently, run make according to the provided configurations. For example, employ the following commands:

$ make config ENABLE_SDL=0
$ make

RISCOF

RISCOF (RISC-V Compatibility Framework) is a Python based framework that facilitates testing of a RISC-V target against a golden reference model.

The RISC-V Architectural Tests, also known as riscv-arch-test, provide a fundamental set of tests that can be used to verify that the behavior of the RISC-V model aligns with RISC-V standards while executing specific applications. These tests are not meant to replace thorough design verification.

Reference signatures are generated by the formal RISC-V model RISC-V SAIL in Executable and Linkable Format (ELF) files. ELF files contain multiple testing instructions, data, and signatures, such as cadd-01.elf. The specific data locations that the testing model (this emulator) must write to during the test are referred to as test signatures. These test signatures are written upon completion of the test and are then compared to the reference signature. Successful tests are indicated by matching signatures.

To install RISCOF:

$ python3 -m pip install git+https://github.com/riscv/riscof

RISC-V GNU Compiler Toolchain should be prepared in advance. You can obtain prebuilt GNU toolchain for riscv32-elf from the Automated Nightly Release. Then, run the following command:

$ make arch-test

For macOS users, installing sdiff might be required:

$ brew install diffutils

To run the tests for specific extension, set the environmental variable RISCV_DEVICE to one of I, M, A, F, C, Zifencei, privilege.

$ make arch-test RISCV_DEVICE=I

Current progress of this emulator in riscv-arch-test (RV32):

  • Passed Tests
    • I: Base Integer Instruction Set
    • M: Standard Extension for Integer Multiplication and Division
    • A: Standard Extension for Atomic Instructions
    • F: Standard Extension for Single-Precision Floating-Point
    • C: Standard Extension for Compressed Instruction
    • Zifencei: Instruction-Fetch Fence
    • privilege: RISCV Privileged Specification

Detail in riscv-arch-test:

Benchmarks

The benchmarks are classified into three categories based on their characteristics:

Category Benchmark Description
Computing intensive puzzle A sliding puzzle where numbered square tiles are arranged randomly with one tile missing, designed for solving the N-puzzle problem.
Pi Calculates the millionth digit of π.
miniz Compresses and decompresses 8 MiB of data.
primes Finds the largest prime number below 33333333.
sha512 Computes the SHA-512 hash of 64 MiB of data.
I/O intensive Richards An OS task scheduler simulation benchmark for comparing system implementations.
Dhrystone Evaluates string operations, involves frequent memory I/O, and generates the performance metric.
Computing and I/O Hybrid Mandelbrot A benchmark based on the Mandelbrot set, which uses fixed-point arithmetic and involves numerous integer operations.
AES Includes 23 encryption and decryption algorithms adhering to the Advanced Encryption Standard.
Nqueens A puzzle benchmark where n queens are placed on an n × n chessboard without attacking each other, using deep recursion for execution.
qsort Sorts an array with 50 million items.

These benchmarks performed by rv32emu (interpreter-only mode) and Spike v1.1.0. Ran on Intel Core i7-11700 CPU running at 2.5 GHz and an Ampere eMAG 8180 microprocessor equipped with 32 Arm64 cores, capable of speeds up to 3.3 GHz. Both systems ran Ubuntu Linux 22.04.1 LTS. We utilized gcc version 12.3, configured as riscv32-unknown-elf-gcc.

The figures below illustrate the normalized execution time of rv32emu and Spike, where the shorter indicates better performance.

x86-64

Arm64

Continuous Benchmarking

Continuous benchmarking is integrated into GitHub Actions, allowing the committer and reviewer to examine the comment on benchmark comparisons between the pull request commit(s) and the latest commit on the master branch within the conversation. This comment is generated by the benchmark CI and provides an opportunity for discussion before merging.

The results of the benchmark will be rendered on a GitHub page. Check benchmark-action/github-action-benchmark for the reference of benchmark CI workflow.

There are several files that have the potential to significantly impact the performance of rv32emu, including:

  • src/decode.c
  • src/rv32_template.c
  • src/emulate.c

As a result, any modifications made to these files will trigger the benchmark CI.

GDB Remote Debugging

rv32emu is permitted to operate as gdbstub in an experimental manner since it supports a limited number of GDB Remote Serial Protocol (GDBRSP). To enable this feature, you need to build the emulator and set ENABLE_GDBSTUB=1 when running the make command. After that, you might execute it using the command below.

$ build/rv32emu -g <binary>

The <binary> should be the ELF file in RISC-V 32 bit format. Additionally, it is advised that you compile programs with the -g option in order to produce debug information in your ELF files.

You can run riscv-gdb if the emulator starts up correctly without an error. It takes two GDB commands to connect to the emulator after giving GDB the supported architecture of the emulator and any debugging symbols it may have.

$ riscv32-unknown-elf-gdb
(gdb) file <binary>
(gdb) target remote :1234

Congratulate yourself if riscv-gdb does not produce an error message. Now that the GDB command line is available, you can communicate with rv32emu.

Dump registers as JSON

If the -d [filename] option is provided, the emulator will output registers in JSON format. This feature can be utilized for tests involving the emulator, such as compiler tests.

You can also combine this option with -q to directly use the output. For example, if you want to read the register x10 (a0), then run the following command:

$ build/rv32emu -d - -q out.elf | jq .x10

Usage Statistics

RISC-V Instructions/Registers

This is a static analysis tool for assessing the usage of RV32 instructions/registers in a given target program. Build this tool by running the following command:

$ make tool

After building, you can launch the tool using the following command:

$ build/rv_histogram [-ar] [target_program_path]

The tool includes two optional options:

  • -a: output the analysis in ascending order(default is descending order)
  • -r: output usage of registers(default is usage of instructions)

Example Instructions Histogram Instructions Hisrogram Example

Example Registers Histogram Registers Hisrogram Example

Basic Block

To install lolviz, use the following command:

$ pip install lolviz

For macOS users, it might be necessary to install additional dependencies:

$ brew install graphviz

Build the profiling data by executing rv32emu. This can be done as follows:

$ build/rv32emu -p build/[test_program].elf

To analyze the profiling data, use the rv_profiler tool with the desired options:

$ tools/rv_profiler [--start-address|--stop-address|--graph-ir] [test_program]

WebAssembly Translation

Build and run

rv32emu relies on Emscripten to be compiled to WebAssembly. Thus, the target system should have the Emscripten version 3.1.51 installed.

Moreover, rv32emu leverages the tail call optimization(TCO) strategy and we have tested the WebAssembly execution in Chrome with at least MAJOR 112 and Firefox with at least MAJOR 121 since they supports tail call feature. Thus, please check and update your browsers if necessary or install the suitable browsers before going further.

Source your Emscripten SDK environment before make. For macOS and Linux user:

$ source ~/emsdk/emsdk_env.sh

Change the Emscripten SDK environment path if necessary.

At this point, you can build and start a web server service to serve WebAssembly by running:

$ make CC=emcc start-web

You would see the server's IP:PORT in your terminal. Copy and paste it to the browsers and you just access the index page of rv32emu.

Index page

You would see a dropdown menu which you can use to select the ELF executable. Select one and click the Run button to run it.

Alternatively, you may want to view a hosted rv32emu demo page since building takes some time.

Contributing

See CONTRIBUTING.md for contribution guidelines.

License

rv32emu is available under a permissive MIT-style license. Use of this source code is governed by a MIT license that can be found in the LICENSE file.

External sources

In rv32emu repository, there are some prebuilt ELF files for testing purpose.

Reference

rv32emu's People

Contributors

2011eric avatar alanjian85 avatar ccs100203 avatar chinyikming avatar dougpuob avatar eagletw avatar eecheng87 avatar feathertw avatar felixonmars avatar fourcolor avatar gagachang avatar henrybear327 avatar howjmay avatar jserv avatar korin777 avatar lambertwsj avatar lgtm-migrator avatar long-long-float avatar maromasamsa avatar qwe661234 avatar rinhizakura avatar risheng1128 avatar sammer1107 avatar steven1lung avatar vacantron avatar visitorckw avatar willwillhi1 avatar xiaohan484 avatar zoanana990 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rv32emu's Issues

Manage basic block with adaptive replacement cache

Current basic block management consumes a significant amount of memory, which leads to unnecessary waste due to frequent map allocation and release. Adaptive Replacement Cache (ARC) is a page replacement algorithm with better performance than least recently used (LRU). Better memory usage and hit rates can be achieved after the translated blocks are handled by ARC by keeping track of frequently used and recently used pages as well as a recent eviction history for both.

Quote from ZFS Caching:

  • Combines the best of LRU and LFU, plus some novel tricks
  • The cache size (c) is partitioned (p) into two sections
  • At the start, p = ½ c, first half of the cache is LRU, second LFU
  • In addition to the two caches, there is a “ghost” list for each
  • Each time an item is evicted from either cache, its key (but not its data) moves to the ghost list for that cache

Expected output:

  1. Create cache.[ch] which consist of clean API. See map.[ch] for reference.
  2. Implement adaptive replacement cache in cache.[ch] along with configurable capacity.
  3. Integrate ARC in basic block management.

Reference:

Cache tests fail after the enforcement of newline at the end of files

Issue #154 enforces a newline at the end of files, with the exception of cache tests, which are excluded due to failures.

The way to reproduce:

$ git ls-files -z |  while IFS= read -rd '' f; do if file --mime-encoding "$f" | grep -qv binary; then tail -c1 < "$f" | read -r _ || echo >> "$f"; fi; done
$ make tests

Then, the error would be caught.

Running lfu/cache-new ... cmp: EOF on build/cache/lfu/cache-new.out after byte 10, line 1
Failed.
make: *** [run-test-cache] Error 1

Fix suspicious pointer scaling

Tracking issue for:

src/emulate.c:713
This pointer might have type unsigned long (size 8), but this pointer arithmetic is done with type uint32_t * (size 4).
Pointer arithmetic in C and C++ is automatically scaled according to the size of the data type. For example, if the type of p is T* and sizeof(T) == 4 then the expression p+1 adds 4 bytes to p. This can cause a buffer overflow condition if the programmer forgets that they are adding a multiple of sizeof(T), rather than a number of bytes.
This query finds pointer arithmetic expressions where it appears likely that the programmer has forgotten that the offset is automatically scaled.
Common Weakness Enumeration: CWE-468.

src/emulate.c:717

src/emulate.c:721

Run `riscv-arch-test` in CI pipeline

The build system is capable of running the tests for specific extensions, such as I, M, C, Zifencei, privilege, and F. However, there is no regular validation for each pull request and/or git commit regarding ISA compliance. We shall integrate riscv-arch-test into CI pipeline.

Expected output:

  1. Fetch (custom) GNU Toolchain for RISC-V: ensure all ELF executables were stripped in advance, so that we can run CI pipeline a bit faster.
  2. Run riscv-arch-test for RV32I and RV32M. Other extensions can be validated upon requested.
  3. Show brief report for riscv-arch-test.

riscv.c 中 op_system 的 CSRRWI 處理是否寫錯了?

static bool op_system(struct riscv_t *rv, uint32_t inst)
{
#ifdef ENABLE_Zicsr
    case 1: {  // CSRRW    (Atomic Read/Write CSR)
        uint32_t tmp = csr_csrrw(rv, csr, rv->X[rs1]);
        rv->X[rd] = rd ? tmp : rv->X[rd];
        break;
    }
   // ...
    case 5: {  // CSRRWI
        uint32_t tmp = csr_csrrc(rv, csr, rv->X[rs1]); // 這裡是否應該改為 csr_csrrw(rv, csr, rv->X[rs1])
        rv->X[rd] = rd ? tmp : rv->X[rd];
        break;
    }
   // ...
}

Avoid duplications in RISC-V exception handlers

Code duplication in the function body of the RISC-V exception handlers might be a maintenance headache. That is, we shall refine these functions in src/emulate.c:

  • rv_except_insn_misaligned
  • rv_except_load_misaligned
  • rv_except_store_misaligned
  • rv_except_illegal_insn
  • rv_except_breakpoint proposed in #60

Code generation using preprocessor macros may be an approach to avoid such duplications.

Fail to run RV32C tests when `ENABLE_COMPUTED_GOTO` is turned off

Although #18 resolved build failure, the emulation for RV32C does not work as expected. Reproduce:

$ make ENABLE_COMPUTED_GOTO=0 clean arch-test RISCV_DEVICE=C

Then, there is no progress for RV32C tests.

$ cat build/arch-test/rv32i_m/C/cadd-01.log
rv32emu: io.c:76: memory_read_ifetch: Assertion `c' failed.

Incorrect emulation for micro-AES

µAES is a minimalist ANSI-C compatible code for the AES encryption and block cipher modes. In this repository, the prebuilt binary build/aes.elf is provided for testing purpose. However, rv32emu would behave differently than expected when compared to Spike and rv8.

[ Spike ]

riscv-isa-sim/build/spike --isa=RV32G riscv-pk/build/pk aes.elf 

Results:

AES-128 EAX encryption: PASSED!
AES-128 EAX decryption: PASSED!

[ rv8 ]

rv-jit aes.elf

Results:

AES-128 EAX encryption: PASSED!
AES-128 EAX decryption: PASSED!

[ rv32emu ]

rv32emu aes.elf

Results:

AES-128 EAX encryption: FAILED :(
AES-128 EAX decryption: FAILED :(

Utilize dominators for constructing extended basic blocks

Quoted from Basic Blocks and CFG, the definition of extended basic block (EBB):

  • Extended basic block a maximal sequence of instructions beginning with a leader that contains no join nodes other than its first node.
  • Has a single entry, but possible multiple exit points.
  • Some optimizations are more effective on extended basic blocks.

We can identify loops by using dominators:

  • a node A in the flowgraph dominates a node B if every path from entry node to B includes A.
  • This relations is antisymmetric, reflexive, and transitive.

back edge: An edge in the flow graph, whose head dominates its tail (example - edge from B6 to B4).

A loop consists of all nodes dominated by its entry node (head of the back edge) and having exactly one back edge in it.

Intercept contains an effective dominator implementation. See

Usage:

void codegen_optimise(CodegenContext *ctx) {
  opt_inline_global_vars(ctx);
  opt_analyse_functions(ctx);

  /// Optimise each function individually.
  do {
    foreach_ptr (IRFunction*, f, ctx->functions) {
      if (f->is_extern) continue;

      DominatorInfo dom = {0};
      do {
        build_dominator_tree(f, &dom, true);
        opt_reorder_blocks(f, &dom);
      } while (
          opt_const_folding_and_strengh_reduction(f) ||
          opt_dce(f) ||
          opt_mem2reg(f) ||
          opt_jump_threading(f, &dom) ||
          opt_tail_call_elim(f)
      );
      free_dominator_info(&dom);
    }
  }

  /// Cross-function optimisations.
  while (opt_inline_global_vars(ctx) || opt_analyse_functions(ctx));
}

Similarly, blink comes with an approach to detect loops during code generation.

jit: Properly adjust THRESHOLD

On macOS/x86-64, I discovered the need to increase the THRESHOLD (in src/cache.c) from 32768 to 65536 in order to achieve the desired performance of SciMark2, aligning it with GNU/Linux (commit cb0a153). This observation emphasizes the importance of establishing robust guidelines for adjusting the threshold to ensure consistent performance.

Implement performance counters and timers

RISC-V ISAs provide a set of up to 32×64-bit performance counters and timers.

RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudo-instructions read the full 64 bits of the cycle, time, and instret counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only.

  • RDCYCLE: The execution environment should provide a means to determine the current rate (cycles/second) at which the cycle counter is incrementing.
  • RDTIME: The execution environment should provide a means of determining the period of the real-time counter (seconds/tick). The environment should provide a means to determine the accuracy of the clock.

tests: Improve cycle clock counts

Reference code from src/cycleclock.h

// This should return the number of cycles since power-on.  Thread-safe.
inline BENCHMARK_ALWAYS_INLINE int64_t Now() {
  uint32_t cycles_lo, cycles_hi0, cycles_hi1;
  // This asm also includes the PowerPC overflow handling strategy, as above.
  // Implemented in assembly because Clang insisted on branching.
  asm volatile(
      "rdcycleh %0\n"
      "rdcycle %1\n"
      "rdcycleh %2\n"
      "sub %0, %0, %2\n"
      "seqz %0, %0\n"
      "sub %0, zero, %0\n"
      "and %1, %1, %0\n"
      : "=r"(cycles_hi0), "=r"(cycles_lo), "=r"(cycles_hi1));
  return (static_cast<uint64_t>(cycles_hi1) << 32) | cycles_lo;
}

CI: Check diverse combinations of build-time configurations

There are various combinations of build-time configurations, such as

  • (ENABLE_EXT_C = 0) AND (ENABLE_COMPUTED_GOTO = 0)
  • (ENABLE_EXT_C = 1) AND (ENABLE_COMPUTED_GOTO = 0)
  • (ENABLE_SDL = 0) AND (ENABLE_EXT_C) AND (ENABLE_COMPUTED_GOTO = 0)

Within CI pipeline, we shall check the diverse configurations.

Improve memory read/write

After analyzing the data collected during the execution of the dhrystone benchmark, it becomes evident that the primary performance bottleneck lies in memory read and write operations. To address this issue and improve performance, we should consider modifying the implementation of these memory operations.

  21.15%  rv32emu  rv32emu           [.] memory_write
  17.21%  rv32emu  rv32emu           [.] do_sw
  12.82%  rv32emu  rv32emu           [.] do_lw
   6.89%  rv32emu  rv32emu           [.] memory_read_w
   6.68%  rv32emu  rv32emu           [.] on_mem_write_w

jit: format warnings during code generation

clang raises the following warnings:

src/jit_template.c:317:57: warning: flag ' ' results in undefined behavior with 'u' conversion specifier [-Wformat]
GEN("    rv->X[%u] = !udivisor ? udividend : udividend % udivisor;\n", ir->rd);
                                                       ~^~

For the modulo (%) operation, %% should be used in GEN rather than %.

Rework IR as a graph of extended basic blocks

The intermediate representation (IR) function used in wip/jit branch is a graph of basic blocks. Extended basic block (EBB; or superblock) is a collection of BBs with one label at the beginning and internal labels, each of which is the target of only one internal jump and no external jumps. EBB may contain internal branch instructions and is closer to how machine code works.

LLVM uses phi instructions in its SSA representation. Cranelift passes arguments to EBBs instead. The two representations are equivalent, but the EBB arguments are better suited to handle EBBs that may contain multiple branches to the same destination block with different arguments. Passing arguments to an EBB looks a lot like passing arguments to a function call, and the register allocator treats them very similarly. Arguments are assigned to registers or stack locations.

The definition of Control flow graph (CFG):

  • A rooted directed graph G = (N,E), where N is given by the set of basic blocks + two special BBs: entry and exit.
  • And edge connects two basic blocks b1 and b2 if control can pass from b1 to b2.
  • An edge(s) from entry node to the initial basic block(s?)
  • From each final basic blocks (with no successors) to exit BB.

Extended basic block

  • a maximal sequence of instructions beginning with a leader that contains no join nodes other than its first node.
  • Has a single entry, but possible multiple exit points.
  • Some optimizations are more effective on extended basic blocks.

Extended basic block

We can identify loops by using dominators

  • a node A in the flowgraph dominates a node B if every path from entry node to B includes A.
  • This relations is antisymmetric, reflexive, and transitive.
  • back edge: An edge in the flow graph, whose head dominates its tail
  • A loop consists of all nodes dominated by its entry node (head of the back edge) and having exactly one back edge in it.

identify loops

The goal of dominators and postdominators is to determine loops in the flowgraph.

Use case: ARMware, an ARMv4 / Compaq iPAQ emulator, has a built-in threaded code engine which will cache an EBB (extended basic block) of ARM codes, so that it can increase the execution speed. Further more, ARMware has a built-in dynamic compiler which will translate an EBB of ARM codes into a block of x86 machine codes, so that it can increase the runtime performance dramatically. The optimization techniques implemented in this dynamic compiler include:

  • Redundant condition code calculation elimination
  • Global grouping conditional execution instruction
  • Redundant jump elimination
  • Dead code elimination
  • Constant folding
  • Global Common Subexpression Elimination
  • Global redundant memory operation elimination
  • Algebraic canonicalization
  • Global SSA form based linear scan register allocation

Reference:

Implement input event specific system calls for SDL support

rv32emu comes with experimental SDL support, which can render the frame via SDL2. However, it is unlikely useful to most visual applications, and there should be some relevant system calls associated with input events.

Expected output:

  1. Add syscall_poll_event amd syscall_get_input which listen to keyboard/mouse events abstracted by SDL2.
  2. Provide a neat and minimal test suite (RV32 ELF executables) for SDL support.

Incorrect performance counter

@ypaskell implemented a number of 64-bit read-only user-level counters, including CSR_CYCLE and CSR_CYCLEH (#33).

The procedure to validate:

make clean
rm -rf src/mini-gdbstub
git reset --hard 53a6f9463c4b344746cb1154cbfb995e44f30300
make
build/rv32emu build/perfcount.elf

Output:

cycle count: 1728
instret: 174
Sparkle state:
4DF96879 8C7C2C33
82236B4A 904F4DD7
D6A030E8 F03B09AA
C4C3BB34 F063DFF9
61F9CEFF 8EC21FFA
93DF370F 83ACF1E2

However, in recent commits, both CSR_CYCLE and CSR_CYCLEH fail to deliver the expected values.

$ build/rv32emu build/perfcount.elf 
cycle count: 0
instret: 0
Sparkle state:
4DF96879 8C7C2C33
82236B4A 904F4DD7
D6A030E8 F03B09AA
C4C3BB34 F063DFF9
61F9CEFF 8EC21FFA
93DF370F 83ACF1E2

There is no increment for both cycle count and instret.

Migrate to RISC-V Compatibility Framework (RISCOF)

Recently, RISC-V Architecture Test SIG refines the build system and introduces RISC-V Compatibility Framework (RISCOF) which enables testing of a RISC-V target (hard or soft implementations) against a standard RISC-V golden reference model using a suite of RISC-V architectural assembly tests. It implies dramatical incompatibility, and we would stick to the old framework based on GNU make at the moment. Quote from riscv-arch-test:

The older 2.x version of the framework (based on Makefiles) can be found in a separate branch : old-framework-2.x. This branch is officially no longer supported and all changes must occur on the main branch.

In order to synchronize with latest RISC-V Architecture Tests, we shall migrate.

Introduce sound related system calls

For the purpose of engaging in video game emulation, sound is of utmost importance. Unfortunately, there is a deficiency in the emulator's support for sound-related system calls, which could be effectively addressed through the integration of SDL2.

Reference:

  • FPGRARS: Fast Pretty Good RISC-V Assembly Rendering System

Non-portable memory allocation

There is a non-portable issue in src/io.c:

/* set memory size to 2^32 bytes */
#define MEM_SIZE 0x100000000ULL

If we are building with Emscripten (see also #75), 32-bit target is set by default, and the clang would complain as following:

src/io.c:27:35: warning: implicit conversion from 'unsigned long long' to 'size_t' (aka 'unsigned long') changes value from 4294967296 to 0 [-Wconstant-conversion]
   27 |     data_memory_base = mmap(NULL, MEM_SIZE, PROT_READ | PROT_WRITE,
      |                        ~~~~       ^~~~~~~~
src/io.c:21:18: note: expanded from macro 'MEM_SIZE'
   21 | #define MEM_SIZE 0x100000000ULL
      |                  ^~~~~~~~~~~~~~
src/io.c:43:27: warning: implicit conversion from 'unsigned long long' to 'size_t' (aka 'unsigned long') changes value from 4294967296 to 0 [-Wconstant-conversion]
   43 |     munmap(mem->mem_base, MEM_SIZE);
      |     ~~~~~~                ^~~~~~~~
src/io.c:21:18: note: expanded from macro 'MEM_SIZE'
   21 | #define MEM_SIZE 0x100000000ULL
      |                  ^~~~~~~~~~~~~~

rv32emu should be portable for both 32-bit and 64-bit environments. Therefore, we need to address the improper memory allocation issue that has been raised on 32-bit targets.

SDL: Support window resizing

Some users may find the demos' windows to be too small, and unfortunately, the emulator does not currently respond to user requests for resizing; thus, window resizing or some kind of scaling should be supported.

Fast-path execution through memcpy/memset Injection

Special super-instructions, such as memcpy and memset provided by newlib, have the capability to substitute specific functions implemented in RISC-V machine code.

By utilizing these purpose-built super-instructions, which are specifically designed for efficient memory copying and setting operations, it is anticipated that the overall speed of emulation will be enhanced. This proposed solution entails replacing the original RISC-V machine code functions with the optimized super-instructions, leading to improved performance while maintaining full functionality.

See MEMZERO and MEMCOPY instructions proposal for reference.

Proposed changes:

  1. Implement super-instructions MEMZERO and MEMCOPY within the src/emulation.c file.
  2. Verify whether the input ELF file includes the symbols for memcpy or memset.
  3. If these symbols are present, employ the existing macro operation fusion mechanism to substitute the function calls to memcpy and/or memset with the aforementioned super-instructions.

Implement instruction usage histogram

With the ability to record and print histograms, we can observe instruction frequency and print. Sample output:

instruction usage histogram
~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1. lw         16.37% [843055543] #######################################
    2. xor        13.85% [713031972] ################################
    3. add        13.69% [704643870] ################################
    4. slli       13.20% [679477645] ###############################
    5. srliw      10.75% [553648296] #########################
    6. andi        9.94% [511705332] #######################
    7. srli        3.05% [157286473] #######
    8. lbu         2.93% [150995155] ######
    9. addi        2.69% [138412722] ######
   10. addiw       2.44% [125829177] #####
   11. sd          1.79% [92275184 ] ####
   12. sb          1.47% [75497501 ] ###
   13. jal         1.14% [58720476 ] ##
   14. beq         1.06% [54526059 ] ##
   15. ld          0.98% [50332182 ] ##
   16. and         0.98% [50331715 ] ##
   17. slliw       0.98% [50331682 ] ##
   18. bne         0.90% [46137534 ] ##
   19. or          0.65% [33554442 ] #
   20. auipc       0.41% [20971539 ] 
   21. lui         0.24% [12583002 ] 
   22. mulw        0.16% [8388608  ] 
   23. lwu         0.16% [8388608  ] 
   24. jalr        0.08% [4194443  ] 
   25. sraiw       0.08% [4194314  ] 
   26. sw          0.00% [213      ] 
   27. bltu        0.00% [78       ] 
   28. bge         0.00% [39       ] 
   29. blt         0.00% [33       ] 
   30. bgeu        0.00% [33       ] 
   31. sub         0.00% [29       ] 
...

Reference:

  • rv8 : Generate instructions Histogram via "rv-bin histogram"

Open quake and doom failed

After merge the commit #62 , I find that the quake and doom can't be opened properly.
As long as I open quake or doom, both of them will crash immediately. Therefore, I
think that there are some bugs in commit #62 .

Handle argc/argv/envp properly

On Linux, after argc, argv, and envp, there is auxv, which serves as a key-value store for binary data. However, current rv32emu fails to set argc, argc, and envp properly. They should be specified during the setting of default stack pointer.

void rv_reset(riscv_t *rv, riscv_word_t pc)
{
    ...
    rv->X[rv_reg_sp] = DEFAULT_STACK_ADDR;
    ...
}

Reference:

Integrate `embench-iot` for benchmarking

Embench benchmark suite is designed to test the performance of deeply embedded systems.

  • The measurement of execution performance is designed to use "hot" caches. Thus each benchmark executes its entire code several times, before starting a timing run.
  • Execution runs are scaled to take approximately 4 second of CPU time. This is large enough to be accurately measured, yet means all benchmarks, including cache warm up can be run in a few minutes.
  • To facilitate execution on machines of different performance, the tests are scaled by the clock speed of the processor.

Emulator freeze when printing LDBL_MAX

ieee754_MRE.c

#include <float.h>
#include <stdio.h>

int main(void)
{
    printf("%La\n", (long double) LDBL_MAX);
    return 0;
}

Build MRE and emulate:

$ riscv32-unknown-elf-gcc -Wall -O2 -std=c99 -march=rv32i -mabi=ilp32 -o ieee754 ieee754_MRE.c  -lm
$ rv32emu ieee754

Lower instruction decoding and dispatch overhead

wip/instruction-decode branch breaks RISC-V instruction decoding and emulation into separate stage, meaning that it is feasible to incorporate further IR optimizations and JIT code generation. However, we do need additional efforts to make it practical:

  1. Executing RISC-V instructions by compiling the program a basic block at a time, thus avoiding unnecessary translation;
  2. Implementing an efficient way to look in a hash map for a code block matching the current program counter as wip/jit does;
  3. Reducing IR dispatch cost means of computed-goto or tail-call elimination (as wasm3 does).

All of the above should appear in wip/instruction-decode branch before its merge into master branch.

Security: Uncontrolled data used in path expression

Reported by CodeQL.

src/main.c:183

    /* open the ELF file from the file system */
    elf_t *elf = elf_new();
    if (!elf_open(elf, opt_prog_name)) {

This argument to a file access function is derived from and then passed to elf_open(path), which calls open(__path).

Accessing paths controlled by users can allow an attacker to access unexpected resources. This can result in sensitive information being revealed or deleted, or an attacker being able to influence behavior by modifying unexpected files.

Paths that are naively constructed from data controlled by a user may contain unexpected special characters, such as "..". Such a path may potentially point to any directory on the filesystem.

Recommendation
Validate user input before using it to construct a filepath. Ideally, follow these rules:

  • Do not allow more than a single "." character.
  • Do not allow directory separators such as "/" or "" (depending on the filesystem).
  • Do not rely on simply replacing problematic sequences such as "../". For example, after applying this filter to ".../...//" the resulting string would still be "../".
  • Ideally use a whitelist of known good patterns.

References

Deliver an integrated program to illustrate SDL-oriented system calls

In order to illustrate how to use SDL-oriented system calls, we can deliver an integrated program which utilizes SDL window system and input events. smolnes is ideal for its small footprint and powerful features -- a NES emulator less than 500 lines of code (deobfuscated.c). We can redistribute the open source NES ROM files such as falling-nes.

Expected output:

  1. Replace SDL functions in smolnes with rv32emu specific SDL oriented system calls.
  2. Verify the functionality of NES emulation with falling-nes and some NES games.
  3. By default, the emulator looks for rom.nes and fallbacks to use falling.nes if the former is not found.
  4. Add some preliminary documentation on the usage of SDL-oriented system calls in the source code of modified smolnes.

Later, we might initiate the binary translator which converts 6502 opcode into RV32 or specific IR when we proceed #81, meaning that the NES emulation can be quite efficient.

Unexpected output when running rv32emu

I generate the ELF file from this RISC-V source code, but I get a different output result when executing in rv32emu and Ripes

How I generate it to ELF:

$ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -o my_asm.elf my_asm.s

The result I run in rv32emu:

$ build/rv32emu my_asm.elf
5
0
0
inferior exit code 0

The result I run in Ripes:

5
3
0
inferior exit code 0

which is also the result I expected.

--------------- edit ------------------
maybe there has some problem of lw instruction
these are my test data:

.data
arr1:    
        .byte 7, 1, 5, 3, 6, 4
arr2:    
        .byte 1, 1, 3, 5
arr3:    
        .byte 7, 5, 4, 3, 2

where printArr simply print each byte, with unrolling technique, I use load word instruction lw from address s1 to load 4 bytes in once.

.text
main:  
        la     s1, arr1        # load arr1 address
        addi   s2, x0, 6       # lens of arr1
        jal    ra, printArr    # print each byte one by one 

        la     s1, arr2        # load arr2 address of prices in a1
        addi   s2, x0, 4       # store the size of prices in a2
        jal    ra, printArr    # next instruction store in rd register

        la     s1, arr3        # load arr3 address of prices in a1
        addi   s2, x0, 5       # store the size of prices in a2
        jal    ra, printArr    # next instruction store in rd register 
end:
        addi    a7, x0, 93	   # "exit" syscall is 93 in rv32emu
        addi	a0, x0, 0	   # set ret to 0
        ecall                  # program stop

Here was an output:

# arr 1
(load a word)
7
1
5
3
(load next word)
6
4
(end of arr 1)
# arr 2
(load a word)
1
1
0 (unexpected output occur starting from here)
0
(end of arr 2)
# arr3
(load a word)
0
0
0
0
(load next word)
0
(end of arr3)

Security: Suspicious pointer scaling

Reported by CodeQL.

src/emulate.c:149

    case CSR_CYCLE: /* Cycle counter for RDCYCLE instruction */
        return (uint32_t *) (&rv->csr_cycle) + 0;
    case CSR_CYCLEH: /* Upper 32 bits of cycle */
        return (uint32_t *) (&rv->csr_cycle) + 1;

This pointer might have type (size 8), but this pointer arithmetic is done with type uint32_t * (size 4).

Pointer arithmetic in C and C++ is automatically scaled according to the size of the data type. For example, if the type of p is T* and sizeof(T) == 4 then the expression p+1 adds 4 bytes to p. This can cause a buffer overflow condition if the programmer forgets that they are adding a multiple of sizeof(T), rather than a number of bytes.

This query finds pointer arithmetic expressions where it appears likely that the programmer has forgotten that the offset is automatically scaled.

Recommendation
Whenever possible, use the array subscript operator rather than pointer arithmetic. For example, replace *(p+k) with p[k].
Cast to the correct type before using pointer arithmetic. For example, if the type of p is int* but it really points to an array of type double[] then use the syntax (double*)p + k to get a pointer to the k'th element of the array.

src/emulate.c:147

    /* Machine Counter/Timers */
    case CSR_CYCLE: /* Cycle counter for RDCYCLE instruction */
        return (uint32_t *) (&rv->csr_cycle) + 0;

This pointer might have type (size 8), but this pointer arithmetic is done with type uint32_t * (size 4).

References
Common Weakness Enumeration: CWE-468.

Generate RISC-V instruction decoder from ISA descriptor

There is some relevant documentation included with the current RISC-V instructions decoding implementation. The maintenance and verification, however, are not straightforward. Instead, we may describe how RISC-V instructions are encoded in human readable form; a code generator will then convert this information into C code.
See make_decoder.py from arviss and HiSimu for reference.

Expected output:

  1. Create src/instructions.in which contains the following:
# format of a line in this file:
# <instruction name> <args> <opcode>
#
# <opcode> is given by specifying one or more range/value pairs:
# hi..lo=value or bit=value or arg=value (e.g. 6..2=0x45 10=1 rd=0)
#
# <args> is one of rd, rs1, rs2, rs3, imm20, imm12, imm12lo, imm12hi,
# shamtw, shamt, rm
# rv32i
beq     bimm12hi rs1 rs2 bimm12lo 14..12=0 6..2=0x18 1..0=3
bne     bimm12hi rs1 rs2 bimm12lo 14..12=1 6..2=0x18 1..0=3
blt     bimm12hi rs1 rs2 bimm12lo 14..12=4 6..2=0x18 1..0=3
bge     bimm12hi rs1 rs2 bimm12lo 14..12=5 6..2=0x18 1..0=3
bltu    bimm12hi rs1 rs2 bimm12lo 14..12=6 6..2=0x18 1..0=3
bgeu    bimm12hi rs1 rs2 bimm12lo 14..12=7 6..2=0x18 1..0=3
  1. Prepare scripts/gen-decoder.py (other scripting languages are acceptable.) which can convert from the above into the corresponding C implementation.
  2. Modify build system and src/decode.c to be aware of the above changes.
  3. Create an entry in directory docs which describe the high level idea and the way to describe more extensions.

Run emulator as gdbserver

rISA implements an interesting GDB mode, which runs the simulator as a gdbserver. That is, it allows us to connect RISC-V program with a remote GDB via target remote or target extended-remote -- but without linking in the usual debugging stub. Hopefully, rv32emu can follow the experimental feature in rISA to provide builtin gdbserver.

See also:

  • riscv-gdbserver: GDB Server for interacting with RISC-V models, boards and FPGAs

Use portable JIT compilation for accelerating RISC-V emulation

rv8 demonstrates how RISC-V instruction emulation can benefit from JIT compilation and aggressive optimizations. However, it is dedicated to x86-64 and hard to support other host architectures, such as Apple M1 (Aarch64). SFUZZ is a high performance fuzzer using RISC-V to x86 binary translations with modern fuzzing techniques. RVVM is another example to implement tracing JIT.

The goal of this task to utilize existing JIT framework as a new abstraction layer while we accelerate RISC-V instruction executions. In particular, we would

  1. Avoid direct machine code generation. Instead, most operations are enforced in intermediate representation (IR) level.
  2. Perform common optimization techniques such as peephole optimization. ria-jit performs excellent work in regards to such optimization. See src/gen/instr/patterns.c and MEMZERO and MEMCOPY instructions proposal
  3. Use high-level but still efficient IR. MIR is an interesting implementation, which allows using subset of C11 for IR. SFUZZ's note Code Generation is worth reading. ETISS (Extendable Translating Instruction Set Simulator) translates binary instructions into C code and appends translated code into a block, which will be compiled and executed at runtime. As aforementioned, it is Extendable, thus it supports myriad level of customization by adopting the technique of plug-ins.
  4. Ensure shorter startup-time. It can be achieved by means of lightweight JIT framework and AOT compilation.

The JIT compilation's high level operation can be summed up as follows:

  • Look in a hash map for a code block matching the current PC
  • if a block is found
    • execute this block
  • if a block is not found
    • allocate a new block
    • invoke the translator for this block
    • insert it into the hash map
    • execute this block

Every block will come to an end after a branch instruction has been translated since translation occurs at the basic block level. Then, there is room for further optimization passes performed on the generated code.

We gain speed by using the technique for the reasons listed below:

  • No instruction fetch
  • No instruction decode
  • Immediate values are baked into translated instructions
    • Values of 0 can be optimized
  • register x0 can be optimized
    • No lookup required
    • Writes are discarded
  • Reduced emulation loop overhead
  • Blocks can be chained based on previous branch pattern for faster lookup

Reference:

Fail to run SciMark2

SciMark2 was integrated and verified. See directory tests/scimark2 for its source files.
However, I fail to run SciMark2 recently.

$ build/rv32emu build/scimark2.elf
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to [email protected])     **
**                                                              **
Segmentation fault

The procedure to execute:

$ make clean
$ git reset --hard 6db46fc222bb11c227a8c140691ece711f1a200e
$ rm -r src/mini-gdbstub
$ make

Reference output:

$ build/rv32emu build/scimark2.elf
**                                                              **
** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
** for details. (Results can be submitted to [email protected])     **
**                                                              **
Using       2.00 seconds min time per kenel.
Composite Score:            4.14
FFT             Mflops:     0.55    (N=1024)
SOR             Mflops:     1.48    (100 x 100)
MonteCarlo:     Mflops:    16.15
Sparse matmult  Mflops:     1.29    (N=1000, nz=5000)
LU              Mflops:     1.26    (M=100, N=100)

Use of floating point library functions

rv32emu/src/emulate.c

Lines 18 to 27 in 2b11443

#if defined(__APPLE__)
static inline int isinff(float x)
{
return __builtin_fabsf(x) == __builtin_inff();
}
static inline int isnanf(float x)
{
return x != x;
}
#endif

We've used some library functions in emulate.c
But there are also implementations in calc_fclass() that provides similar functionality:

rv32emu/src/soft_float.h

Lines 28 to 59 in 2b11443

static inline uint32_t calc_fclass(uint32_t f)
{
const uint32_t sign = f & FMASK_SIGN;
const uint32_t expn = f & FMASK_EXPN;
const uint32_t frac = f & FMASK_FRAC;
/* TODO: optimize with a binary decision tree */
uint32_t out = 0;
/* 0x001 rs1 is -INF */
out |= (f == 0xff800000) ? 0x001 : 0;
/* 0x002 rs1 is negative normal */
out |= (expn && (expn != FMASK_EXPN) && sign) ? 0x002 : 0;
/* 0x004 rs1 is negative subnormal */
out |= (!expn && frac && sign) ? 0x004 : 0;
/* 0x008 rs1 is -0 */
out |= (f == 0x80000000) ? 0x008 : 0;
/* 0x010 rs1 is +0 */
out |= (f == 0x00000000) ? 0x010 : 0;
/* 0x020 rs1 is positive subnormal */
out |= (!expn && frac && !sign) ? 0x020 : 0;
/* 0x040 rs1 is positive normal */
out |= (expn && (expn != FMASK_EXPN) && !sign) ? 0x040 : 0;
/* 0x080 rs1 is +INF */
out |= (expn == FMASK_EXPN && !frac && !sign) ? 0x080 : 0;
/* 0x100 rs1 is a signaling NaN */
out |= (expn == FMASK_EXPN && frac && !(frac & FMASK_QNAN)) ? 0x100 : 0;
/* 0x200 rs1 is a quiet NaN */
out |= (expn == FMASK_EXPN && (frac & FMASK_QNAN)) ? 0x200 : 0;
return out;
}

Which one should we stick to?

x < height 是否應該改為 x < width ?

在 syscall_sdl.c 當中的下列這段程式,x < height 是否應該改為 x < width ?

     for (size_t y = 0; y < height; ++y) {
        for (size_t x = 0; x < height; ++x) {
            const uint8_t c = p[x];
            const uint8_t *lut = j + (c * 3);
            d[x] = (lut[0] << 16) | (lut[1] << 8) | lut[2];
        }
        p += width, d += width;
    }

修改為

     for (size_t y = 0; y < height; ++y) {
        for (size_t x = 0; x < width; ++x) {
            const uint8_t c = p[x];
            const uint8_t *lut = j + (c * 3);
            d[x] = (lut[0] << 16) | (lut[1] << 8) | lut[2];
        }
        p += width, d += width;
    }

Fail to build when ENABLE_COMPUTED_GOTO is turned off

By default, RV32C support is enabled. When the configuration ENABLE_COMPUTED_GOTO is set to 0, it fails to build:

$ make ENABLE_COMPUTED_GOTO=0
  CC	map.o
  CC	riscv.o
...
riscv.c:1417:19: error: lvalue required as left operand of assignment
 1417 |             index = (inst & INST_6_2) >> 2;
      |                   ^
riscv.c:1420:39: error: array subscript is not an integer
 1420 |             TABLE_TYPE op = jump_table[index];
      |                                       ^
...

Investigate interpreter dispatch methods

It would still make sense to consolidate the existing interpreter as the foundation of tiered compilation before we actually develop JIT compiler (#81). See A look at the internals of 'Tiered JIT Compilation' in .NET Core for context. Although #95 uses tail-cail optimization (TCO) to reduce interpreter dispatch cost, we still need to investigate at several interpreter dispatch techniques before deciding how to move forward with more performance improvements and code maintenance.

The author of wasm3 provides an interesting project interp, which implements the following methods:

Preliminary experiments on Intel Xeon CPU E5-2650 v4 @ 2.20GHz with bench.

[ Calls Loop ]

time                 2.782 s    (1.765 s .. 3.482 s)
                     0.985 R²   (0.949 R² .. 1.000 R²)
mean                 2.743 s    (2.623 s .. 2.903 s)
std dev              167.7 ms   (43.46 ms .. 225.4 ms)
variance introduced by outliers: 19% (moderately inflated)

[ Switching ]

time                 2.430 s    (2.135 s .. 2.684 s)
                     0.998 R²   (0.994 R² .. 1.000 R²)
mean                 2.550 s    (2.461 s .. 2.682 s)
std dev              135.7 ms   (23.52 ms .. 176.6 ms)
variance introduced by outliers: 19% (moderately inflated)

[ Direct Threaded Code ]

time                 2.058 s    (1.242 s .. 2.725 s)
                     0.974 R²   (0.964 R² .. 1.000 R²)
mean                 1.756 s    (1.571 s .. 1.920 s)
std dev              191.4 ms   (108.1 ms .. 268.4 ms)
variance introduced by outliers: 23% (moderately inflated)

[ Token (Indirect) Threaded Code ]

time                 1.912 s    (1.376 s .. 3.088 s)
                     0.957 R²   (0.931 R² .. 1.000 R²)
mean                 1.564 s    (1.456 s .. 1.762 s)
std dev              193.0 ms   (12.64 ms .. 237.9 ms)
variance introduced by outliers: 23% (moderately inflated)

[ Tail Calls ]

time                 1.414 s    (1.027 s .. 1.736 s)
                     0.987 R²   (0.985 R² .. 1.000 R²)
mean                 1.131 s    (1.020 s .. 1.239 s)
std dev              139.4 ms   (2.226 ms .. 168.8 ms)
variance introduced by outliers: 23% (moderately inflated)

[ machine code Inlining ]

time                 344.6 ms   (57.24 ms .. 478.0 ms)
                     0.923 R²   (NaN R² .. 1.000 R²)
mean                 383.3 ms   (342.6 ms .. 412.6 ms)
std dev              42.76 ms   (23.80 ms .. 53.86 ms)
variance introduced by outliers: 23% (moderately inflated)

After #95 is merged, we are concerned about

  • efficient interpreting.
  • the flexibility to switch between JIT compilation and interpretation.
  • less impact on the current codebase.

Reference:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.