RFR: 8353786: Migrate Vector API math library support to FFM API [v10]
Jorn Vernee
jvernee at openjdk.org
Fri Apr 18 17:09:51 UTC 2025
On Thu, 17 Apr 2025 18:03:47 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:
>> Migrate Vector API math library (SVML and SLEEF) linkage from native code (in JVM) to Java FFM API.
>>
>> Since FFM API doesn't support vector calling conventions yet, migration affects only symbol lookup for now. But it still enables significant simplifications on JVM side.
>>
>> The patch consists of the following parts:
>> * on-demand symbol lookup in Java code replaces eager lookup from native code during JVM startup;
>> * 2 new VM intrinsics for vector calls (support unary and binary shapes) (code separated from unary/binary vector operations);
>> * new internal interface to query supported CPU ISA extensions (`jdk.incubator.vector.CPUFeatures`) used for CPU dispatching.
>>
>> `java.lang.foreign` API is used to perform symbol lookup in vector math library, then the address is cached and fed into corresponding JVM intrinsic, so C2 can turn it into a direct vector call in generated code.
>>
>> Once `java.lang.foreign` supports vectors & vector calling conventions, VM intrinsics can go away.
>>
>> Performance is on par with original implementation (tested with microbenchmarks on linux-x64 and macosx-aarch64).
>>
>> Testing: hs-tier1 - hs-tier6, microbenchmarks (on linux-x64 and macosx-aarch64)
>>
>> Thanks!
>
> Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision:
>
> - Merge remote-tracking branch 'origin/master' into vector.math.01.java
> - RVV and SVE adjustments
> - fix broken merge
> - Merge branch 'master' into vector.math.01.java
> - Fix debugName handling
> - Merge branch 'master' into vector.math.01.java
> - RVV and SVE adjustments
> - Merge branch 'master' into vector.math.01.java
> - Fix windows-aarch64 build failure
> - features_string -> cpu_info_string
> - ... and 14 more: https://git.openjdk.org/jdk/compare/7b477dbf...88eacc48
Very interesting! Looks mostly good to me. Left a few inline notes.
src/hotspot/share/prims/vectorSupport.cpp line 622:
> 620:
> 621: ThreadToNativeFromVM ttn(thread);
> 622: return env->NewStringUTF(features_string);
Isn't there a way to do this without the extra transition?
How about:
oop result = java_lang_String::create_oop_from_str((char*) bytes, CHECK_NULL);
return (jstring) JNIHandles::make_local(THREAD, result);
src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 379:
> 377: V libraryBinaryOp(long addr, Class<? extends V> vClass, Class<E> eClass, int length, String debugName,
> 378: V v1, V v2,
> 379: BinaryOperation<V,?> defaultImpl) {
I notice that the bound of `V` differs between `libraryUnaryOp`, which uses `Vectory<E>` and this method, which uses `VectorPayload`. Not sure if this is intentional?
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 272:
> 270: MemorySegment addr = LOOKUP.findOrThrow(symbol);
> 271: debug("%s %s => 0x%016x\n", op, symbol, addr.address());
> 272: T impl = implSupplier.apply(opc); // TODO: should call the very same native implementation eventually (once FFM API supports vectors)
FWIW, one current barrier I see to implementing the vector calling convention in the linker, is that the FFM linker (currently) transmits register values to the downcall stub use Java primitive types. So, in order to support vector calling conventions, we would need to add some kind of 'primitive' that can hold the entire vector value, and preferably gets passed in the right register.
However, I think in the case of these math libraries in particular, speed of the fallback implementation is not that much of an issue, since there is also an intrinsic. So alternatively, we could split a vector value up into smaller integral types (`int`, `long`) -> pass them to the downcall stub in that form -> and then reconstruct the full vector value in its target register. (I used the same trick when I was experimenting with FP80 support, which also requires splitting up the 80 bit value up into 2 `long`s).
src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathLibrary.java line 305:
> 303: @ForceInline
> 304: /*package-private*/ static
> 305: <E, V extends Vector<E>>
Here you're using `Vector` instead of `VectorPayload` for the binary op, so there seems to be a discrepancy with `VectorSupport`.
-------------
Marked as reviewed by jvernee (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/24462#pullrequestreview-2778903954
PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050830705
PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050818654
PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050875854
PR Review Comment: https://git.openjdk.org/jdk/pull/24462#discussion_r2050869087
More information about the hotspot-dev
mailing list