RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v3]

Jatin Bhateja jbhateja at openjdk.org
Fri Jul 4 06:21:40 UTC 2025


On Thu, 3 Jul 2025 07:10:22 GMT, erifan <duke at openjdk.org> wrote:

>> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant.
>> 
>> And this conversion also enables further optimizations that recognize maskAll patterns, see [1].
>> 
>> Some JTReg test cases are added to ensure the optimization is effective.
>> 
>> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64.
>> 
>> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed.
>> 
>> [1] https://github.com/openjdk/jdk/pull/24674
>
> erifan has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Simplify the test code

Can you kindly include a micro with this patch? 

  ```
public static final VectorSpecies<Float> FSP = FloatVector.SPECIES_512;
  public static long micro1(long a) {
     long mask = Math.min(-1, Math.max(-1, a));
     return VectorMask.fromLong(FSP, mask).toLong();
  }
  public static long micro2() {
     return FSP.maskAll(true).toLong();
  }



Your patch now removes L2M and M2L IR nodes.


Baseline:-
SPR2>java --add-modules=jdk.incubator.vector -Xbatch  -XX:CompileCommand=PrintIdealPhase,test_mask_all::micro1,BEFORE_MATCHING -XX:-TieredCompilation -cp .
test_mask_all 0
AFTER: BEFORE_MATCHING
  65  ConL  === 0  [[ 377 ]]  #long:65535
 369  Return  === 5 6 7 8 9 returns 399  [[ 0 ]]
 377  VectorLongToMask  === _ 65  [[ 398 ]]  #vectormask<F,16> !jvms: VectorMask::fromLong @ bci:39 (line 243) test_mask_all::micro1 @ bci:18 (line 9)
 398  VectorMaskCast  === _ 377  [[ 399 ]]  #vectormask<I,16> !jvms: Float512Vector$Float512Mask::toLong @ bci:35 (line 765) test_mask_all::micro1 @ bci:21 (line 9)
 399  VectorMaskToLong  === _ 398  [[ 369 ]]  #long !jvms: Float512Vector$Float512Mask::toLong @ bci:35 (line 765) test_mask_all::micro1 @ bci:21 (line 9)
[time] 5 ms  [res] 1310700000000

With patch:-
XX:CompileCommand=PrintIdealPhase,test_mask_all::micro1,BEFORE_MATCHING -XX:-TieredCompilation -cp . test_mask_all 0
CompileCommand: PrintIdealPhase test_mask_all.micro1 const char* PrintIdealPhase = 'BEFORE_MATCHING'
WARNING: Using incubator modules: jdk.incubator.vector
AFTER: BEFORE_MATCHING
  65  ConL  === 0  [[ 369 ]]  #long:65535
 369  Return  === 5 6 7 8 9 returns 65  [[ 0 ]]
[time] 3 ms  [res] 1310700000000

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25793#issuecomment-3034669174


More information about the hotspot-compiler-dev mailing list