RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v9]
Xiaohong Gong
xgong at openjdk.org
Mon Feb 2 08:24:31 UTC 2026
On Wed, 28 Jan 2026 01:49:52 GMT, Eric Fang <erfang at openjdk.org> wrote:
>> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common.
>>
>> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example:
>> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)`
>> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`.
>>
>> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast.
>>
>> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`.
>>
>> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain.
>>
>> Current optimizations related to `VectorMaskCastNode` include:
>> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760.
>> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242.
>>
>> This PR does the following optimizations:
>> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct.
>> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect...
>
> Eric Fang has updated the pull request incrementally with one additional commit since the last revision:
>
> Add clearer comments to VectorMaskCastIdentityTest.java
test/hotspot/jtreg/compiler/vectorapi/VectorStoreMaskIdentityTest.java line 288:
> 286: @IR(counts = { IRNode.VECTOR_LOAD_MASK, "= 0",
> 287: IRNode.VECTOR_STORE_MASK, "= 0",
> 288: IRNode.VECTOR_MASK_CAST, "= 0" },
Can we also check `IRNode.LOAD_VECTOR` to make sure these APIs are intrinsified successfully, and the nodes are eliminated by `VectorStoreMask::Identity()` ? Because if these APIs are not intrinsified due to some reasons, above IRs do not exist as well.
test/hotspot/jtreg/compiler/vectorapi/VectorStoreMaskIdentityTest.java line 289:
> 287: IRNode.VECTOR_STORE_MASK, "= 0",
> 288: IRNode.VECTOR_MASK_CAST, "= 0" },
> 289: applyIfCPUFeatureOr = { "asimd", "true", "avx2", "true" },
This might not affect the result because it has `applyIf = { "MaxVectorSize", "> 16" }` check additionally. But it will more accurate if:
Suggestion:
applyIfCPUFeatureOr = { "sve", "true", "avx2", "true" },
because the max vector size for neon is 16-byte.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2753114650
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2753117731
More information about the core-libs-dev
mailing list