RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort

Paul Sandoz paul.sandoz at oracle.com
Fri Dec 6 23:59:23 UTC 2024


Hi Vladimir,

Excellent work, very happy to see more of this moved to Java leveraging Panama features. The Java code looks very organized.

I am wondering if this technique can be applied to stubs dynamically generated by HotSpot via some sort of special library lookup e.g., for crypto.

Do you have a sense of the differences in static memory footprint and startup cost? Things I imagine Leyden could help with.

Regarding CPU dispatching, my preference would be to do it in Java. Less native logic. This may also be useful to help determine whether we can/should expose capabilities in the Vector API regarding what is optimally supported or not. I presume it also does not preclude some sort of jlink plugin that strips unused methods from the native libraries, something which may be tricker if done in the native library itself?

Paul.


> On Dec 6, 2024, at 3:18 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> Recently, a trend emerged to use native libraries to back intrinsics in HotSpot JVM. SVML stubs for Vector API paved the road and it was soon followed by SLEEF and simdsort libraries.
> 
> After examining their support, I must confess that it doesn't look pretty. It introduces significant accidental complexity on JVM side. HotSpot has to be taught about every entry point in each library in an ad-hoc manner. It's inherently unsafe, error-prone to implement and hard to maintain: JVM makes a lot of assumptions about an entry point based solely on its symbolic name and each library has its own naming conventions. Overall, current approach doesn't scale well.
> 
> Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It provides enough functionality to interact with native libraries from Java in performant manner.
> 
> I did an exercise to migrate all 3 libraries away from intrinsics and the results look promising:
> 
>  simdsort: https://github.com/openjdk/jdk/pull/22621
> 
>  SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619
> 
> As of now, java.lang.foreign lacks vector calling convention support, so the actual calls into SVML/SLEEF are still backed by intrinsics. But it still enables a major cleanup on JVM side.
> 
> Also, I coded library headers and used jextract to produce initial library API sketch in Java and it worked really well. Eventually, it can be incorporated into JDK build process to ensure the consistency between native and Java parts of library API.
> 
> Performance wise, it is on par with current (intrinsic-based) implementation.
> 
> One open question relates to CPU dispatching.
> 
> Each library exposes multiple functions with different requirements about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON vs SVE). Right now, it's JVM responsibility, but once it gets out of the loop, the library itself should make the decision. I experimented with 2 approaches: (1) perform CPU dispatching with linking library from Java code (as illustrated in aforementioned PRs); or (2) call into native library to query it about the right entry point [1] [2] [3]. In both cases, it depends on additional API to sense the JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now).
> 
> Let me know if you have any questions/suggestions/concerns. Thanks!
> 
> I plan to eventually start publishing PRs to upstream this work.
> 
> Best regards,
> Vladimir Ivanov
> 
> [1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11
> 
> [2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java
> 
> [3] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c
> 



More information about the hotspot-compiler-dev mailing list