RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v2]

Xiaohong Gong xgong at openjdk.org
Thu Jul 3 02:24:47 UTC 2025


On Thu, 3 Jul 2025 01:52:52 GMT, erifan <duke at openjdk.org> wrote:

>> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant.
>> 
>> And this conversion also enables further optimizations that recognize maskAll patterns, see [1].
>> 
>> Some JTReg test cases are added to ensure the optimization is effective.
>> 
>> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64.
>> 
>> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed.
>> 
>> [1] https://github.com/openjdk/jdk/pull/24674
>
> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
> 
>  - Address some review comments
>    
>    Add support for the following patterns:
>      toLong(maskAll(true))  => (-1ULL >> (64 -vlen))
>      toLong(maskAll(false)) => 0
>    
>    And add more test cases.
>  - Merge branch 'master' into JDK-8356760
>  - 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases
>    
>    If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would
>    set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent
>    to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is
>    relative smaller than that of `fromLong`. This patch does the conversion
>    for these cases if `l` is a compile time constant.
>    
>    And this conversion also enables further optimizations that recognize
>    maskAll patterns, see [1].
>    
>    Some JTReg test cases are added to ensure the optimization is effective.
>    
>    I tried many different ways to write a JMH benchmark, but failed. Since
>    the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific
>    compile-time constant, the statement will be hoisted out of the loop.
>    If we don't use a loop, the hotspot will become other instructions, and
>    no obvious performance change was observed. However, combined with the
>    optimization of [1], we can observe a performance improvement of about
>    7% on both aarch64 and x64.
>    
>    The patch was tested on both aarch64 and x64, all of tier1 tier2 and
>    tier3 tests passed.
>    
>    [1] https://github.com/openjdk/jdk/pull/24674

Looks much better to me. Thanks for your updating!

-------------

Marked as reviewed by xgong (Committer).

PR Review: https://git.openjdk.org/jdk/pull/25793#pullrequestreview-2981322138


More information about the hotspot-compiler-dev mailing list