RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v2]
Galder Zamarreño
galder at openjdk.org
Fri Nov 28 09:23:58 UTC 2025
On Thu, 20 Nov 2025 04:01:09 GMT, Eric Fang <erfang at openjdk.org> wrote:
>> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common.
>>
>> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example:
>> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)`
>> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`.
>>
>> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast.
>>
>> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`.
>>
>> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain.
>>
>> Current optimizations related to `VectorMaskCastNode` include:
>> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760.
>> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242.
>>
>> This PR does the following optimizations:
>> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct.
>> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect...
>
> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>
> - Don't read and write the same memory in the JMH benchmarks
> - Merge branch 'master' into JDK-8370863-mask-cast-opt
> - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns
>
> `VectorMaskCastNode` is used to cast a vector mask from one type to
> another type. The cast may be generated by calling the vector API `cast`
> or generated by the compiler. For example, some vector mask operations
> like `trueCount` require the input mask to be integer types, so for
> floating point type masks, the compiler will cast the mask to the
> corresponding integer type mask automatically before doing the mask
> operation. This kind of cast is very common.
>
> If the vector element size is not changed, the `VectorMaskCastNode`
> don't generate code, otherwise code will be generated to extend or narrow
> the mask. This IR node is not free no matter it generates code or not
> because it may block some optimizations. For example:
> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))`
> The middle `VectorMaskCast` prevented the following optimization:
> `(VectorStoremask (VectorLoadMask x)) => (x)`
> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which
> blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`.
>
> In these IR patterns, the value of the input `x` is not changed, so we
> can safely do the optimization. But if the input value is changed, we
> can't eliminate the cast.
>
> The general idea of this PR is introducing an `uncast_mask` helper
> function, which can be used to uncast a chain of `VectorMaskCastNode`,
> like the existing `Node::uncast(bool)` function. The funtion returns
> the first non `VectorMaskCastNode`.
>
> The intended use case is when the IR pattern to be optimized may
> contain one or more consecutive `VectorMaskCastNode` and this does not
> affect the correctness of the optimization. Then this function can be
> called to eliminate the `VectorMaskCastNode` chain.
>
> Current optimizations related to `VectorMaskCastNode` include:
> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760.
> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
> => (VectorMaskCast (...
Nice improvement @erifan, just some small comments from me
src/hotspot/share/opto/vectornode.cpp line 1056:
> 1054: // x remains to be a bool vector with no changes.
> 1055: // This function can be used to eliminate the VectorMaskCast in such patterns.
> 1056: Node* VectorNode::uncast_mask(Node* n) {
Could this be a static method instead?
test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 57:
> 55: applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"})
> 56: public static int testTwoCastToDifferentType() {
> 57: // The types before and after the two casts are not the same, so the cast cannot be eliminated.
Outdated comment. Also please expand assertion comments
test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 79:
> 77: applyIfCPUFeatureAnd = {"avx2", "true", "avx512", "false"})
> 78: public static int testTwoCastToDifferentType2() {
> 79: // The types before and after the two casts are not the same, so the cast cannot be eliminated.
Could you expand the documentation on the IR assertions? It's not immediately clear why with AVX-512 the cast remains but with AVX-2 it's removed. Also, this comment is outdated.
test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 240:
> 238:
> 239: @Test
> 240: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",
Could you add some assertion comments here as well to understand what causes the differences with different architectures?
test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 260:
> 258:
> 259: @Test
> 260: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",
Same here
-------------
Changes requested by galder (Author).
PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3518051437
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570915091
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570925650
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570924373
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570932229
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2570932750
More information about the hotspot-compiler-dev
mailing list