RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v11]
Emanuel Peter
epeter at openjdk.org
Wed Feb 25 13:14:15 UTC 2026
On Wed, 25 Feb 2026 11:15:33 GMT, Eric Fang <erfang at openjdk.org> wrote:
>> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common.
>>
>> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example:
>> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)`
>> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`.
>>
>> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast.
>>
>> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`.
>>
>> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain.
>>
>> Current optimizations related to `VectorMaskCastNode` include:
>> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760.
>> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242.
>>
>> This PR does the following optimizations:
>> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct.
>> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect...
>
> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits:
>
> - Improve the code comment and tests
> - Merge branch 'master' into JDK-8370863-mask-cast-opt
> - Refine the JTReg tests
> - Add clearer comments to VectorMaskCastIdentityTest.java
> - Update copyright year to 2026
> - Merge branch 'master' into JDK-8370863-mask-cast-opt
> - Convert the check condition for vector length into an assertion
>
> Also refined the tests.
> - Refine code comments
> - Merge branch 'master' into JDK-8370863-mask-cast-opt
> - Merge branch 'master' into JDK-8370863-mask-cast-opt
> - ... and 6 more: https://git.openjdk.org/jdk/compare/c0c1775a...dcd64ad1
I'm getting some failures with your new test:
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "compiler.vectorapi.VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityFloat256" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"sve", "true", "avx2", "true"}, counts={"_#V#LOAD_VECTOR_Z#_", "_ at 4", ">= 3", "_#VECTOR_LOAD_MASK#_", "= 0", "_#VECTOR_STORE_MASK#_", "= 0", "_#VECTOR_MASK_CAST#_", "= 0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", "> 16"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 2: "(\\d+(\\s){2}(VectorLoadMask.*)+(\\s){2}===.*)"
- Failed comparison: [found] 1 = 0 [given]
- Matched node:
* 277 VectorLoadMask === _ 106 [[ 275 ]] #vectormask<F,4> !jvms: VectorMask::fromArray @ bci:47 (line 209) VectorStoreMaskIdentityTest::testTwoCastsKernel @ bci:5 (line 75) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityFloat256 @ bci:29 (line 271)
* Constraint 3: "(\\d+(\\s){2}(VectorStoreMask.*)+(\\s){2}===.*)"
- Failed comparison: [found] 1 = 0 [given]
- Matched node:
* 254 VectorStoreMask === _ 271 272 [[ 216 ]] #vectors<Z,4> !jvms: AbstractMask::intoArray @ bci:50 (line 75) VectorStoreMaskIdentityTest::testTwoCastsKernel @ bci:20 (line 77) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityFloat256 @ bci:29 (line 271)
* Constraint 4: "(\\d+(\\s){2}(VectorMaskCast.*)+(\\s){2}===.*)"
- Failed comparison: [found] 2 = 0 [given]
- Matched nodes (2):
* 271 VectorMaskCast === _ 275 [[ 254 ]] #vectormask<J,4> !jvms: ShortVector64$ShortMask64::cast @ bci:54 (line 665) VectorStoreMaskIdentityTest::testTwoCastsKernel @ bci:13 (line 77) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityFloat256 @ bci:29 (line 271)
* 275 VectorMaskCast === _ 277 [[ 271 ]] #vectormask<S,4> !orig=[5222],[2038] !jvms: FloatVector128$FloatMask128::cast @ bci:54 (line 654) VectorStoreMaskIdentityTest::testTwoCastsKernel @ bci:9 (line 76) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityFloat256 @ bci:29 (line 271)
This was run on an x64 machine with extra flags:
`-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation`
And with the same flags, I get a failure on an aarch64 machine:
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "compiler.vectorapi.VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityLong" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true"}, counts={"_#V#LOAD_VECTOR_Z#_", "_ at 2", ">= 3", "_#VECTOR_LOAD_MASK#_", "= 0", "_#VECTOR_STORE_MASK#_", "= 0", "_#VECTOR_MASK_CAST#_", "= 0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">= 16"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 2: "(\\d+(\\s){2}(VectorLoadMask.*)+(\\s){2}===.*)"
- Failed comparison: [found] 1 = 0 [given]
- Matched node:
* 278 VectorLoadMask === _ 101 [[ 277 ]] #vectorx<J,2> !jvms: VectorMask::fromArray @ bci:47 (line 209) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:5 (line 85) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityLong @ bci:55 (line 220)
* Constraint 3: "(\\d+(\\s){2}(VectorStoreMask.*)+(\\s){2}===.*)"
- Failed comparison: [found] 1 = 0 [given]
- Matched node:
* 238 VectorStoreMask === _ 263 264 [[ 194 ]] #vectord<Z,2> !jvms: AbstractMask::intoArray @ bci:50 (line 75) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:24 (line 88) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityLong @ bci:55 (line 220)
* Constraint 4: "(\\d+(\\s){2}(VectorMaskCast.*)+(\\s){2}===.*)"
- Failed comparison: [found] 3 = 0 [given]
- Matched nodes (3):
* 263 VectorMaskCast === _ 275 [[ 238 ]] #vectord<I,2> !jvms: DoubleVector128$DoubleMask128::cast @ bci:54 (line 650) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:17 (line 88) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityLong @ bci:55 (line 220)
* 275 VectorMaskCast === _ 277 [[ 263 ]] #vectorx<D,2> !orig=[3350] !jvms: IntVector64$IntMask64::cast @ bci:54 (line 661) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:13 (line 87) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityLong @ bci:55 (line 220)
* 277 VectorMaskCast === _ 278 [[ 275 ]] #vectord<I,2> !jvms: LongVector128$LongMask128::cast @ bci:54 (line 651) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:9 (line 86) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityLong @ bci:55 (line 220)
And with other flags, also on aarch64, I get:
`-XX:-TieredCompilation -XX:VerifyIterativeGVN=1110`
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "compiler.vectorapi.VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityInt" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx", "true"}, counts={"_#V#LOAD_VECTOR_Z#_", "_ at 4", ">= 3", "_#VECTOR_LOAD_MASK#_", "= 0", "_#VECTOR_STORE_MASK#_", "= 0", "_#VECTOR_MASK_CAST#_", "= 0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={"MaxVectorSize", ">= 16"}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
> Phase "PrintIdeal":
- counts: Graph contains wrong number of nodes:
* Constraint 2: "(\\d+(\\s){2}(VectorLoadMask.*)+(\\s){2}===.*)"
- Failed comparison: [found] 1 = 0 [given]
- Matched node:
* 278 VectorLoadMask === _ 118 [[ 277 ]] #vectorx<I,4> !jvms: VectorMask::fromArray @ bci:47 (line 209) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:5 (line 85) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityInt @ bci:55 (line 184)
* Constraint 3: "(\\d+(\\s){2}(VectorStoreMask.*)+(\\s){2}===.*)"
- Failed comparison: [found] 1 = 0 [given]
- Matched node:
* 220 VectorStoreMask === _ 256 257 [[ 168 ]] #vectord<Z,4> !jvms: AbstractMask::intoArray @ bci:50 (line 75) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:24 (line 88) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityInt @ bci:55 (line 184)
* Constraint 4: "(\\d+(\\s){2}(VectorMaskCast.*)+(\\s){2}===.*)"
- Failed comparison: [found] 3 = 0 [given]
- Matched nodes (3):
* 256 VectorMaskCast === _ 274 [[ 220 ]] #vectord<S,4> !jvms: FloatVector128$FloatMask128::cast @ bci:54 (line 654) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:17 (line 88) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityInt @ bci:55 (line 184)
* 274 VectorMaskCast === _ 277 [[ 256 ]] #vectorx<F,4> !orig=5368,[4300] !jvms: ShortVector64$ShortMask64::cast @ bci:54 (line 665) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:13 (line 87) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityInt @ bci:55 (line 184)
* 277 VectorMaskCast === _ 278 [[ 274 ]] #vectord<S,4> !orig=[5231] !jvms: IntVector128$IntMask128::cast @ bci:54 (line 665) VectorStoreMaskIdentityTest::testThreeCastsKernel @ bci:9 (line 86) VectorStoreMaskIdentityTest::testVectorMaskStoreIdentityInt @ bci:55 (line 184)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28313#issuecomment-3959167455
More information about the core-libs-dev
mailing list