RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Dec 6 23:18:17 UTC 2024


Recently, a trend emerged to use native libraries to back intrinsics in 
HotSpot JVM. SVML stubs for Vector API paved the road and it was soon 
followed by SLEEF and simdsort libraries.

After examining their support, I must confess that it doesn't look 
pretty. It introduces significant accidental complexity on JVM side. 
HotSpot has to be taught about every entry point in each library in an 
ad-hoc manner. It's inherently unsafe, error-prone to implement and hard 
to maintain: JVM makes a lot of assumptions about an entry point based 
solely on its symbolic name and each library has its own naming 
conventions. Overall, current approach doesn't scale well.

Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It 
provides enough functionality to interact with native libraries from 
Java in performant manner.

I did an exercise to migrate all 3 libraries away from intrinsics and 
the results look promising:

   simdsort: https://github.com/openjdk/jdk/pull/22621

   SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619

As of now, java.lang.foreign lacks vector calling convention support, so 
the actual calls into SVML/SLEEF are still backed by intrinsics. But it 
still enables a major cleanup on JVM side.

Also, I coded library headers and used jextract to produce initial 
library API sketch in Java and it worked really well. Eventually, it can 
be incorporated into JDK build process to ensure the consistency between 
native and Java parts of library API.

Performance wise, it is on par with current (intrinsic-based) 
implementation.

One open question relates to CPU dispatching.

Each library exposes multiple functions with different requirements 
about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON vs 
SVE). Right now, it's JVM responsibility, but once it gets out of the 
loop, the library itself should make the decision. I experimented with 2 
approaches: (1) perform CPU dispatching with linking library from Java 
code (as illustrated in aforementioned PRs); or (2) call into native 
library to query it about the right entry point [1] [2] [3]. In both 
cases, it depends on additional API to sense the JVM/hardware 
capabilities (exposed on jdk.internal.misc.VM for now).

Let me know if you have any questions/suggestions/concerns. Thanks!

I plan to eventually start publishing PRs to upstream this work.

Best regards,
Vladimir Ivanov

[1] 
https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11

[2] 
https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java

[3] 
https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c



More information about the hotspot-compiler-dev mailing list