RFR: 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases [v2]

erifan duke at openjdk.org
Thu Jul 3 01:52:52 UTC 2025


> If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is relative smaller than that of `fromLong`. This patch does the conversion for these cases if `l` is a compile time constant.
> 
> And this conversion also enables further optimizations that recognize maskAll patterns, see [1].
> 
> Some JTReg test cases are added to ensure the optimization is effective.
> 
> I tried many different ways to write a JMH benchmark, but failed. Since the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific compile-time constant, the statement will be hoisted out of the loop. If we don't use a loop, the hotspot will become other instructions, and no obvious performance change was observed. However, combined with the optimization of [1], we can observe a performance improvement of about 7% on both aarch64 and x64.
> 
> The patch was tested on both aarch64 and x64, all of tier1 tier2 and tier3 tests passed.
> 
> [1] https://github.com/openjdk/jdk/pull/24674

erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Address some review comments
   
   Add support for the following patterns:
     toLong(maskAll(true))  => (-1ULL >> (64 -vlen))
     toLong(maskAll(false)) => 0
   
   And add more test cases.
 - Merge branch 'master' into JDK-8356760
 - 8356760: VectorAPI: Optimize VectorMask.fromLong for all-true/all-false cases
   
   If the input long value `l` of `VectorMask.fromLong(SPECIES, l)` would
   set or unset all lanes, `VectorMask.fromLong(SPECIES, l)` is equivalent
   to `maskAll(true)` or `maskAll(false)`. But the cost of `maskAll` is
   relative smaller than that of `fromLong`. This patch does the conversion
   for these cases if `l` is a compile time constant.
   
   And this conversion also enables further optimizations that recognize
   maskAll patterns, see [1].
   
   Some JTReg test cases are added to ensure the optimization is effective.
   
   I tried many different ways to write a JMH benchmark, but failed. Since
   the input of `VectorMask.fromLong(SPECIES, l)` needs to be a specific
   compile-time constant, the statement will be hoisted out of the loop.
   If we don't use a loop, the hotspot will become other instructions, and
   no obvious performance change was observed. However, combined with the
   optimization of [1], we can observe a performance improvement of about
   7% on both aarch64 and x64.
   
   The patch was tested on both aarch64 and x64, all of tier1 tier2 and
   tier3 tests passed.
   
   [1] https://github.com/openjdk/jdk/pull/24674

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/25793/files
  - new: https://git.openjdk.org/jdk/pull/25793/files/38664b06..791e0ab7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=25793&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25793&range=00-01

  Stats: 24487 lines in 940 files changed: 11237 ins; 8323 del; 4927 mod
  Patch: https://git.openjdk.org/jdk/pull/25793.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/25793/head:pull/25793

PR: https://git.openjdk.org/jdk/pull/25793


More information about the hotspot-compiler-dev mailing list