RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Mon Dec 9 11:42:20 UTC 2024
Great work Vlad!
The simdsort part seems a more "classic" FFM binding - where you have a
method handle per entry point. That seems to fit the design of FFM
rather well. In the second case (SVML/SLEEF) usage of FFM is limited to
build a "table of entry points" (e.g. we're just using SymbolLookup +
MemorySegment here -- the invocation part is intrinsified as part of the
new VectorSupport methods).
If it helps, it might be possible to define a custom (JDK internal)
family of value layouts for vector types. Then we could enhance the
Linker classification to support such layouts. This means you could call
into native functions with vector parameters and return types using the
Linker API more directly. Not sure if it will give you the same
performance, but it's also an approach worth exploring.
Re. support for custom calling conventions to call into hotspot stubs
from Java, this might be possible - our story for supporting calling
conventions other than the system calling convention is that there
should be a dedicated linker instance per calling convention. So, if the
JVM defines its own calling convention for its stubs there should
probably be a custom Linker implementation that is used to call into
such stubs - which uses the machinery in the Linker implementation (e.g.
Bindings) to classify the incoming function descriptors and determine
the shuffle sequence for a given particular call. This should all be
doable (at least inside the JDK) - it's just matter of "writing more code".
I agree with Paul that, as we move more stuff to use Panama, we will
need to look more at the avenues available to us to claim back some of
the additional warm up cost introduced by the use of var/method handles.
This is probably part of a bigger exploration on warmup and FFM.
Cheers
Maurizio
On 06/12/2024 23:18, Vladimir Ivanov wrote:
> Recently, a trend emerged to use native libraries to back intrinsics
> in HotSpot JVM. SVML stubs for Vector API paved the road and it was
> soon followed by SLEEF and simdsort libraries.
>
> After examining their support, I must confess that it doesn't look
> pretty. It introduces significant accidental complexity on JVM side.
> HotSpot has to be taught about every entry point in each library in an
> ad-hoc manner. It's inherently unsafe, error-prone to implement and
> hard to maintain: JVM makes a lot of assumptions about an entry point
> based solely on its symbolic name and each library has its own naming
> conventions. Overall, current approach doesn't scale well.
>
> Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It
> provides enough functionality to interact with native libraries from
> Java in performant manner.
>
> I did an exercise to migrate all 3 libraries away from intrinsics and
> the results look promising:
>
> simdsort: https://github.com/openjdk/jdk/pull/22621
>
> SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619
>
> As of now, java.lang.foreign lacks vector calling convention support,
> so the actual calls into SVML/SLEEF are still backed by intrinsics.
> But it still enables a major cleanup on JVM side.
>
> Also, I coded library headers and used jextract to produce initial
> library API sketch in Java and it worked really well. Eventually, it
> can be incorporated into JDK build process to ensure the consistency
> between native and Java parts of library API.
>
> Performance wise, it is on par with current (intrinsic-based)
> implementation.
>
> One open question relates to CPU dispatching.
>
> Each library exposes multiple functions with different requirements
> about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON
> vs SVE). Right now, it's JVM responsibility, but once it gets out of
> the loop, the library itself should make the decision. I experimented
> with 2 approaches: (1) perform CPU dispatching with linking library
> from Java code (as illustrated in aforementioned PRs); or (2) call
> into native library to query it about the right entry point [1] [2]
> [3]. In both cases, it depends on additional API to sense the
> JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now).
>
> Let me know if you have any questions/suggestions/concerns. Thanks!
>
> I plan to eventually start publishing PRs to upstream this work.
>
> Best regards,
> Vladimir Ivanov
>
> [1]
> https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11
>
> [2]
> https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java
>
> [3]
> https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c
>
More information about the hotspot-compiler-dev
mailing list