RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v2]

Fri Dec 5 08:13:24 UTC 2025

On Fri, 28 Nov 2025 09:09:28 GMT, Galder Zamarreño <galder at openjdk.org> wrote:

>> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Don't read and write the same memory in the JMH benchmarks
>>  - Merge branch 'master' into JDK-8370863-mask-cast-opt
>>  - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns
>>    
>>    `VectorMaskCastNode` is used to cast a vector mask from one type to
>>    another type. The cast may be generated by calling the vector API `cast`
>>    or generated by the compiler. For example, some vector mask operations
>>    like `trueCount` require the input mask to be integer types, so for
>>    floating point type masks, the compiler will cast the mask to the
>>    corresponding integer type mask automatically before doing the mask
>>    operation. This kind of cast is very common.
>>    
>>    If the vector element size is not changed, the `VectorMaskCastNode`
>>    don't generate code, otherwise code will be generated to extend or narrow
>>    the mask. This IR node is not free no matter it generates code or not
>>    because it may block some optimizations. For example:
>>    1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))`
>>    The middle `VectorMaskCast` prevented the following optimization:
>>    `(VectorStoremask (VectorLoadMask x)) => (x)`
>>    2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which
>>    blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`.
>>    
>>    In these IR patterns, the value of the input `x` is not changed, so we
>>    can safely do the optimization. But if the input value is changed, we
>>    can't eliminate the cast.
>>    
>>    The general idea of this PR is introducing an `uncast_mask` helper
>>    function, which can be used to uncast a chain of `VectorMaskCastNode`,
>>    like the existing `Node::uncast(bool)` function. The funtion returns
>>    the first non `VectorMaskCastNode`.
>>    
>>    The intended use case is when the IR pattern to be optimized may
>>    contain one or more consecutive `VectorMaskCastNode` and this does not
>>    affect the correctness of the optimization. Then this function can be
>>    called to eliminate the `VectorMaskCastNode` chain.
>>    
>>    Current optimizations related to `VectorMaskCastNode` include:
>>    1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760.
>>    2. `(XorV (VectorMa...
>
> src/hotspot/share/opto/vectornode.cpp line 1056:
> 
>> 1054: //   x remains to be a bool vector with no changes.
>> 1055: // This function can be used to eliminate the VectorMaskCast in such patterns.
>> 1056: Node* VectorNode::uncast_mask(Node* n) {
> 
> Could this be a static method instead?

Yeah it's already a static method. See https://github.com/openjdk/jdk/pull/28313/files#diff-ba9e2d10a50a01316946660ec9f68321eb864fd9c815616c10abbec39360efe5R141 

Or you mean a static method limited to this file ? If so, I prefer not, it may be used at other places. Thanks~

> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 57:
> 
>> 55:         applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"})
>> 56:     public static int testTwoCastToDifferentType() {
>> 57:         // The types before and after the two casts are not the same, so the cast cannot be eliminated.
> 
> Outdated comment. Also please expand assertion comments

Done, thanks!

> test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 79:
> 
>> 77:         applyIfCPUFeatureAnd = {"avx2", "true", "avx512", "false"})
>> 78:     public static int testTwoCastToDifferentType2() {
>> 79:         // The types before and after the two casts are not the same, so the cast cannot be eliminated.
> 
> Could you expand the documentation on the IR assertions? It's not immediately clear why with AVX-512 the cast remains but with AVX-2 it's removed. Also, this comment is outdated.

This is because the following optimization on AVX2 affects this optimization:

`(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => x`

On AVX2 `trueCount()` requires converting the mask to a **boolean vector** first via `VectorStoreMask`. So `VectorStoreMask` can apply the above optimization, which eliminates all `VectorMaskCast `nodes as a side effect.

On AVX-512, masks use dedicated mask registers (k registers), `VectorStoreMask` is not generated for `trueCount()`, so `VectorMaskCast` nodes remain.

I reorganised this file, please take another look, thanks~

> test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 240:
> 
>> 238: 
>> 239:     @Test
>> 240:     @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",
> 
> Could you add some assertion comments here as well to understand what causes the differences with different architectures?

Done

> test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 260:
> 
>> 258: 
>> 259:     @Test
>> 260:     @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0",
> 
> Same here

Done

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587209533
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587250313
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587250610
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587250972
PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587251084