RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v2]

Thu Jul 3 02:00:51 UTC 2025

On Thu, 26 Jun 2025 07:49:28 GMT, Xiaohong Gong <xgong at openjdk.org> wrote:

>> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>> 
>>  - Address some review comments
>>    
>>    Add support for the following patterns:
>>      toLong(maskAll(true))  => (-1ULL >> (64 -vlen))
>>      toLong(maskAll(false)) => 0
>>    
>>    And add more test cases.
>>  - Merge branch 'master' into JDK-8356760
>>  - 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases
>>    
>>    If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would
>>    set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent
>>    to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is
>>    relative smaller than that of `fromLong`. This patch does the conversion
>>    for these cases if `l` is a compile time constant.
>>    
>>    And this conversion also enables further optimizations that recognize
>>    maskAll patterns, see [1].
>>    
>>    Some JTReg test cases are added to ensure the optimization is effective.
>>    
>>    I tried many different ways to write a JMH benchmark, but failed. Since
>>    the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific
>>    compile-time constant, the statement will be hoisted out of the loop.
>>    If we don't use a loop, the hotspot will become other instructions, and
>>    no obvious performance change was observed. However, combined with the
>>    optimization of [1], we can observe a performance improvement of about
>>    7% on both aarch64 and x64.
>>    
>>    The patch was tested on both aarch64 and x64, all of tier1 tier2 and
>>    tier3 tests passed.
>>    
>>    [1] https://github.com/openjdk/jdk/pull/24674
>
> src/hotspot/share/opto/vectorIntrinsics.cpp line 706:
> 
>> 704:         opc = Op_Replicate;
>> 705:         elem_bt = converted_elem_bt;
>> 706:         bits = gvn().longcon(bits_type->get_con() == 0L ? 0L : -1L);
> 
> Code style. Suggest:
> 
> if (opc == Op_VectorLongToMask &&
>     is_maskall_type(bits_type, num_elem) &&
>     arch_supports_vector(Op_Replicate, num_elem, converted_elem_bt, checkFlags, true /*has_scalar_args*/)) {
>   opc = Op_Replicate;
>   elem_bt = converted_elem_bt;
>   bits = gvn().longcon(bits_type->get_con() == 0L ? 0L : -1L);
> } else if (

Done

> So if bits = 0xf0, and the vlen = 4, what is the expected mask?

This is not possible because the input value has been processed in `VectorMask::fromLong`.  See https://github.com/openjdk/jdk/blob/74822ce12acaf9816aa49b75ab5817ced3710776/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMask.java#L242

But for safety, double checked the lowest bit.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2181360080
PR Review Comment: https://git.openjdk.org/jdk/pull/25793#discussion_r2181377150