RFR: 8264409: AArch64: generate better code for Vector API allTrue

Ningsheng Jian njian at openjdk.java.net
Thu Apr 1 08:05:44 UTC 2021


In Vector API NEON implementation, we use a vector register to represent vector mask, where an element value of -1 is a true mask and an element value of 0 is a false mask. The allTrue() api is used to check whether all the elements of current mask are set.

Currently, the AArch64 NEON allTrue implementation looks like:

  andr  $tmp, T16B $src1, $src2\t# src2 is maskAllTrue
  notr  $tmp, T16B, $tmp
  addv  $tmp, T16B, $tmp
  umov  $dst, $tmp, B, 0
  cmp   $dst, 0
  cset  $dst

where $src2 is a preset all true (-1) constant. We can optimize it to the code sequence like below, to check whether all bits are set:

  uminv $tmp, T16B, $src1
  umov  $dst, $tmp, B, 0
  cmp   $dst, 0xff
  cset  $dst

With this codegen improvement, we can see about 8%~70% performance uplift on different machines for Alibaba's Vector API bigdata benchmarks [1][2].

Tested with tier1 and vector api jtreg tests.

[1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/bigdata/BooleanArrayCheck.java#L61
[2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/bigdata/ValueRangeCheckAndCastL2I.java#L93

-------------

Commit messages:
 - 8264409: AArch64: generate better code for Vector API allTrue

Changes: https://git.openjdk.java.net/jdk/pull/3302/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3302&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8264409
  Stats: 409 lines in 5 files changed: 13 ins; 12 del; 384 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3302.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3302/head:pull/3302

PR: https://git.openjdk.java.net/jdk/pull/3302


More information about the hotspot-compiler-dev mailing list