RFR: 8367292: VectorAPI: Optimize VectorMask.fromLong/toLong() for SVE [v4]
Emanuel Peter
epeter at openjdk.org
Tue Oct 28 09:57:17 UTC 2025
On Tue, 28 Oct 2025 05:52:38 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:
>> The current implementations of `VectorMask.fromLong()` and `toLong()` on AArch64 SVE are inefficient. SVE does not support naive predicate instructions for these operations. Instead, they are implemented with vector instructions, but the output/input of `fromLong/toLong` are defined as masks with predicate registers on SVE architectures.
>>
>> For `toLong()`, the current implementation generates a vector mask stored in a vector register with bool type first, then converts the vector to predicate layout. For `fromLong()`, the opposite conversion is needed at the start of codegen.
>>
>> These conversions are expensive and are implemented in the IR backend codegen, which is inefficient. The performance impact is significant on SVE architectures.
>>
>> This patch optimizes the implementation by leveraging two existing C2 IRs (`VectorLoadMask/VectorStoreMask`) that can handle the conversion efficiently. By splitting this work at the mid-end IR level, we align with the current IR pattern used on architectures without predicate features (like AArch64 Neon) and enable sharing of existing common IR optimizations.
>>
>> It also modifies the Vector API jtreg tests for well testing. Here is the details:
>>
>> 1) Fix the smoke tests of `fromLong/toLong` to make sure these APIs are tested actually. These two APIs are not well tested before. Because in the original test, the C2 IRs for `fromLong` and `toLong` are optimized out completely by compiler due to following IR identity:
>>
>> VectorMaskToLong (VectorLongToMask l) => l
>>
>> Besides, an additional warmup loop is necessary to guarantee the APIs are compiled by C2.
>>
>> 2) Refine existing IR tests to verify the expected IR patterns after this patch. Also changed to use the exact required cpu feature on AArch64 for these ops. `fromLong` requires "svebitperm" instead of "sve2".
>>
>> Performance shows significant improvement on NVIDIA's Grace CPU.
>>
>> Here is the performance data with `-XX:UseSVE=2`:
>>
>> Benchmark bits inputs Mode Unit Before After Gain
>> MaskQueryOperationsBenchmark.testToLongByte 128 1 thrpt ops/ms 322151.976 1318576.736 4.09
>> MaskQueryOperationsBenchmark.testToLongByte 128 2 thrpt ops/ms 322187.144 1315736.931 4.08
>> MaskQueryOperationsBenchmark.testToLongByte 128 3 thrpt ops/ms 322213.330 1353272.882 4.19
>> MaskQueryOperationsBenchmark.testToLongInt 128 1 thrpt ops/ms 1009426.292 1339834.833 1.32
>> MaskQueryOperations...
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
> Rename matcher helper function to "mask_op_prefers_predicate" and add
> more comments
@XiaohongGong Thanks for the updates. I left a few more comments.
And thanks for filing:
https://bugs.openjdk.org/browse/JDK-8370666
Are you planning on working on that, or do you know someone else?
I could try, but I'm less familiar with all the concepts, and would need a lot of help.
src/hotspot/cpu/aarch64/aarch64_vector.ad line 401:
> 399: }
> 400:
> 401: assert(vt->isa_vectmask(), "The mask type must be a TypeVectMask on SVE");
Suggestion:
assert(vt->isa_vectmask() != nullptr, "The mask type must be a TypeVectMask on SVE");
Hotspot style guide does not like implicit null/zero checks ;)
src/hotspot/share/opto/matcher.hpp line 339:
> 337: // saved with a predicate type (i.e. TypeVectMask) or not. Return true if it
> 338: // requires a predicate type. And return false if it requires a vector type.
> 339: static bool mask_op_prefers_predicate(int opcode, const TypeVect* vt);
You need to decide if it is a `prefers` or a `requires` concept.
You had some really good explanations here, and I think it would be great if you used some of that here.
https://github.com/openjdk/jdk/pull/27481#discussion_r2464360599
-------------
PR Review: https://git.openjdk.org/jdk/pull/27481#pullrequestreview-3387731863
PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468775304
PR Review Comment: https://git.openjdk.org/jdk/pull/27481#discussion_r2468794050
More information about the hotspot-compiler-dev
mailing list