Giter VIP home page Giter VIP logo

nalim's Introduction

nalim

Nalim is a library for linking Java methods to native functions using JVMCI (JVM compiler interface).

Unlike other Java frameworks for native library access, nalim does not use JNI and therefore does not incur JNI related overhead.

When calling a native function with nalim

  • a thread does not switch from in_Java to in_native state and back;
  • no memory barrier is involved;
  • no JNI handles are created;
  • exception checks and safepoint checks are omitted;
  • native function can access primitive arrays directly in the heap.

As a result, native calls become faster comparing to JNI, especially when a target function is short. In this sense, nalim is similar to JNI Critical Natives, but relies on a standard supported interface. JNI Critical Natives have been deprecated in JDK 16 and obsoleted since JDK 18, so nalim can serve as a replacement.

Examples

1. Basic usage

public class Libc {

    @Link
    public static native int getuid();

    @Link
    public static native int getgid();

    static {
        Linker.linkClass(Libc.class);
    }
}
System.out.println("My user id = " + Libc.getuid());

2. Linking by a different name

public class Mem {

   @Link(name = "malloc")
   public static native long allocate(long size);

   @Link(name = "free")
   public static native void release(long ptr);

   static {
      Linker.linkClass(Mem.class);
   }
}

3. Working with arrays

@Library("crypto")
public class LibCrypto {

    public static byte[] sha256(byte[] data) {
        byte[] digest = new byte[32];
        SHA256(data, data.length, digest);
        return digest;
    }

    @Link
    private static native void SHA256(byte[] data, int len, byte[] digest);
}

4. Accessing object fields

public class Time {
    public long sec;
    public long nsec;

    public static Time current() {
        Time time = new Time();
        clock_gettime(0, time);
        return time;
    }

    @Link
    private static native void clock_gettime(int clk_id, @FieldOffset("sec") Time time);

    static {
        Linker.linkClass(Time.class);
    }
}

5. Inlining raw machine code

public class Cpu {

    // rdtsc
    // shl    $0x20,%rdx
    // or     %rdx,%rax
    // ret
    @Code("0f31 48c1e220 4809d0 c3")
    public static native long rdtsc();

    static {
        Linker.linkClass(Cpu.class);
    }
}

Running

1. As an agent

java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI \
     -javaagent:nalim.jar -cp <classpath> MainClass

This is the simplest way to add nalim to your application, as the agent exports all required JDK internal packages for you.

The agent optionally accepts a list of classes whose native methods will be automatically linked at startup:

-javaagent:nalim.jar=com.example.MyLib,com.example.OtherLib

2. On the classpath

If not adding nalim as an agent, you'll have to add all required --add-exports manually.

java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI                \
     --add-exports jdk.internal.vm.ci/jdk.vm.ci.code=ALL-UNNAMED      \
     --add-exports jdk.internal.vm.ci/jdk.vm.ci.code.site=ALL-UNNAMED \
     --add-exports jdk.internal.vm.ci/jdk.vm.ci.hotspot=ALL-UNNAMED   \
     --add-exports jdk.internal.vm.ci/jdk.vm.ci.meta=ALL-UNNAMED      \
     --add-exports jdk.internal.vm.ci/jdk.vm.ci.runtime=ALL-UNNAMED   \
     -cp nalim.jar:app.jar MainClass 

Performance

JMH benchmark for comparing regular JNI calls with nalim calls is available here.

The following results were obtained on Intel Core i7-1280P CPU with JDK 19.0.1.

Simple native method

static native int add(int a, int b);
Benchmark            Mode  Cnt  Score   Error  Units
JniBench.add_jni     avgt   10  6,718 ± 0,298  ns/op
JniBench.add_nalim   avgt   10  0,821 ± 0,032  ns/op
JniBench.add_panama  avgt   10  6,673 ± 0,307  ns/op

Array processing

static native long max(long[] array, int length);
Benchmark           (length)  Mode  Cnt    Score   Error  Units
JniBench.max_jni          10  avgt   10   24,642 ± 0,741  ns/op
JniBench.max_jni         100  avgt   10   54,626 ± 1,843  ns/op
JniBench.max_jni        1000  avgt   10  433,813 ± 0,864  ns/op
JniBench.max_nalim        10  avgt   10    3,540 ± 0,218  ns/op
JniBench.max_nalim       100  avgt   10   37,211 ± 0,308  ns/op
JniBench.max_nalim      1000  avgt   10  418,057 ± 0,529  ns/op

Supported platforms

  • Linux: amd64 aarch64
  • macOS: amd64 aarch64
  • Windows: amd64

Limitations

A native function called with nalim has certain limitations comparing to a regular JNI function.

  1. It must be static.
  2. It does not have access to JNIEnv and therefore cannot call JNI functions, in particular, it cannot throw exceptions.
  3. Only primitive types, primitive arrays and plain objects with primitive fields can be passed as arguments.
  4. A function must return as soon as possible, since it blocks JVM from reaching a safepoint.

Due to JVMCI limitation in HotSpot, nalim works only with Parallel, Serial and G1 GC, and also with ZGC since JDK 21.

nalim's People

Contributors

apangin avatar luhenry avatar tomeryogev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nalim's Issues

Runtime error on Java 21

Same code works with Java 20

Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.IllegalArgumentException: nmethod entry barrier is missing [in thread "microbench.SimilarityBench.a_dotProductNalim-jmh-worker-5"]
	at jdk.internal.vm.ci/jdk.vm.ci.hotspot.CompilerToVM.installCode0(Native Method)
	at jdk.internal.vm.ci/jdk.vm.ci.hotspot.CompilerToVM.installCode(CompilerToVM.java:549)
	at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotCodeCacheProvider.installCode(HotSpotCodeCacheProvider.java:139)
	at jdk.internal.vm.ci/jdk.vm.ci.code.CodeCacheProvider.setDefaultCode(CodeCacheProvider.java:67)
	at one.nalim.Linker.installCode(Linker.java:174)
	at one.nalim.Linker.linkMethod(Linker.java:121)
	at one.nalim.Linker.linkMethod(Linker.java:93)
	at one.nalim.Linker.linkClass(Linker.java:77)
	at io.github.jbellis.jvector.vector.Jvector.<clinit>(Jvector.java:14)
	... 12 more

Separate GPL and non-GPL part?

The entire agent and public API of nalim are licensed under GPLv2 and thus it is impossible for most organization to use it as a compile time dependency.

It'd be nice if we could separate nalim into two components:

  • nalim-api - The public API licensed under more permissive license.
    • e.g. @Library, @Link, @Code, etc
  • nalim-agent - All other stuff that must be licensed under GPLv2 due to JVMCI usage.
    • e.g. Linker, Agent, etc.

A user who calls Linker.linkClass() will still have the licensing problem, but the user can avoid it by specifying the list of the classes in the agent options instead of calling Linker.linkClass(). (Please correct me if this still causes any licensing issues.)

We could also improve the nalim agent to link the classes automatically, most likely by 1) intercepting the class-loading process or 2) looking for a certain file in a JAR.

Code annotations require supporting code to use correctly

Currently @Code annotations do not implement any argument processing or any means of handling multiple architectures, unlike linked functions. This effectively forces all functions to go through a Java wrapper function at runtime to process arguments and select the appropriate implementation.

For example, even something as simple as querying CPUID output requires something like this:

@Code("4D01C8 4989D9 89D0 0FA2 418900 41895804 41894808 4189500C 4C89CB C3")
private static native void cpuid_amd64_win(int func, int[] out, long out_base_offset);
@Code("488D3C11 89F0 4889DE 0FA2 8907 895F04 894F08 89570C 4889F3 C3")
private static native void cpuid_amd64_linux(int func, int[] out, long out_base_offset);

private static final boolean is_windows = System.getProperty("os.name").toLowerCase().contains("windows");

public static void cpuid(int func, int[] out) {
  if (is_windows) {
    cpuid_amd64_win(func, out, UNSAFE.ARRAY_INT_BASE_OFFSET);
  } else {
    cpuid_amd64_linux(func, out, UNSAFE.ARRAY_INT_BASE_OFFSET);
  }
}

Instead, it seems reasonable to allow specifying multiple possible bytecode sequences that the linker can select at runtime when parsing a class. This would make the feature much simpler to use.

@Code(
  amd64_win = "4989D9 89D0 0FA2 418900 41895804 41894808 4189500C 4C89CB C3",
  amd64_linux = "4889D7 89F0 4889DE 0FA2 8907 895F04 894F08 89570C 4889F3 C3"
)
public static native void cpuid(int func, int[] out);

Arbitrary objects as parameters are possible, it seems

First of all: Fantastic idea and library!

You write under "Limitations":

  1. Only primitive types and primitive arrays can be passed as arguments.

I wonder where this limitation is coming from, since when passing an arbitrary object, you will simply be given the actual object virtual address in the respective register of the calling convention.
When combined with sun.misc.Unsafe to find/validate the field offsets, you can actually do crazy stuff like using SSE/AVX to accelerate operations on float/double fields in your class, when they are consecutive.

So, when being careful here and first validating the field offsets that you expect and using native code via the @Code() annotation, we can access and manipulate the object fields, can't we?

I've used your library to accelerate matrix multiplications from 10.5 ns. to just 4.0 ns. per operation for a class which uses 16 consecutive float fields: https://github.com/JOML-CI/joml-nalim

Comparison against Panama?

I know that their JNI replacement isn't out yet, but including it in the bench seems reasonable. And may push them into providing a critical's replacement as well:)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.