RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Mon Dec 9 11:42:20 UTC 2024


Great work Vlad!

The simdsort part seems a more "classic" FFM binding - where you have a 
method handle per entry point. That seems to fit the design of FFM 
rather well. In the second case (SVML/SLEEF) usage of FFM is limited to 
build a "table of entry points" (e.g. we're just using SymbolLookup + 
MemorySegment here -- the invocation part is intrinsified as part of the 
new VectorSupport methods).

If it helps, it might be possible to define a custom (JDK internal) 
family of value layouts for vector types. Then we could enhance the 
Linker classification to support such layouts. This means you could call 
into native functions with vector parameters and return types using the 
Linker API more directly. Not sure if it will give you the same 
performance, but it's also an approach worth exploring.

Re. support for custom calling conventions to call into hotspot stubs 
from Java, this might be possible - our story for supporting calling 
conventions other than the system calling convention is that there 
should be a dedicated linker instance per calling convention. So, if the 
JVM defines its own calling convention for its stubs there should 
probably be a custom Linker implementation that is used to call into 
such stubs - which uses the machinery in the Linker implementation (e.g. 
Bindings) to classify the incoming function descriptors and determine 
the shuffle sequence for a given particular call. This should all be 
doable (at least inside the JDK) - it's just matter of "writing more code".

I agree with Paul that, as we move more stuff to use Panama, we will 
need to look more at the avenues available to us to claim back some of 
the additional warm up cost introduced by the use of var/method handles. 
This is probably part of a bigger exploration on warmup and FFM.

Cheers
Maurizio



On 06/12/2024 23:18, Vladimir Ivanov wrote:
> Recently, a trend emerged to use native libraries to back intrinsics 
> in HotSpot JVM. SVML stubs for Vector API paved the road and it was 
> soon followed by SLEEF and simdsort libraries.
>
> After examining their support, I must confess that it doesn't look 
> pretty. It introduces significant accidental complexity on JVM side. 
> HotSpot has to be taught about every entry point in each library in an 
> ad-hoc manner. It's inherently unsafe, error-prone to implement and 
> hard to maintain: JVM makes a lot of assumptions about an entry point 
> based solely on its symbolic name and each library has its own naming 
> conventions. Overall, current approach doesn't scale well.
>
> Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It 
> provides enough functionality to interact with native libraries from 
> Java in performant manner.
>
> I did an exercise to migrate all 3 libraries away from intrinsics and 
> the results look promising:
>
>   simdsort: https://github.com/openjdk/jdk/pull/22621
>
>   SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619
>
> As of now, java.lang.foreign lacks vector calling convention support, 
> so the actual calls into SVML/SLEEF are still backed by intrinsics. 
> But it still enables a major cleanup on JVM side.
>
> Also, I coded library headers and used jextract to produce initial 
> library API sketch in Java and it worked really well. Eventually, it 
> can be incorporated into JDK build process to ensure the consistency 
> between native and Java parts of library API.
>
> Performance wise, it is on par with current (intrinsic-based) 
> implementation.
>
> One open question relates to CPU dispatching.
>
> Each library exposes multiple functions with different requirements 
> about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON 
> vs SVE). Right now, it's JVM responsibility, but once it gets out of 
> the loop, the library itself should make the decision. I experimented 
> with 2 approaches: (1) perform CPU dispatching with linking library 
> from Java code (as illustrated in aforementioned PRs); or (2) call 
> into native library to query it about the right entry point [1] [2] 
> [3]. In both cases, it depends on additional API to sense the 
> JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now).
>
> Let me know if you have any questions/suggestions/concerns. Thanks!
>
> I plan to eventually start publishing PRs to upstream this work.
>
> Best regards,
> Vladimir Ivanov
>
> [1] 
> https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11
>
> [2] 
> https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java
>
> [3] 
> https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c
>


More information about the hotspot-compiler-dev mailing list