RFR: 8281375: Accelerate bitCount operation for AVX2 and AVX512 target.
Jatin Bhateja
jbhateja at openjdk.java.net
Mon Feb 7 20:15:25 UTC 2022
Summary of changes:
- Patch extends existing vectorized bitCount optimization added with [JDK-8278868](https://bugs.openjdk.java.net/browse/JDK-8278868) and emits optimized JIT sequence for AVX2 and other AVX512 targets which do not support avx512_vpopcntdq feature.
- Since PopCountVI/PopCountVL node emit different instruction sequence based on the target features hence a rudimentary cost mode has been added which influences the SLP unrolling factor to prevent generating bloated main loops.
- Following are the performance results of an existing [JMH micro](https://github.com/jatin-bhateja/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/VectorBitCount.java) over various X86 targets.
Benchmark | SIZE | Baseline AVX2 (ns/op) | Withopt AVX2 (ns/op) | Gain % | Baseline AVX3 (ns/op) | Withopt AVX3 (ns/op) | Gain % | Baseline AVX3 (VPOPCOUNTDQ) | Withopt AVX3 (VPOCOUNTDQ) | Gain %
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
VectorBitCount.WithSuperword.intBitCount | 1024 | 1089.799 | 420.156 | 61.44646857 | 1083.92 | 203.958 | 81.18329766 | 88.958 | 60.096 | 32.44452438
VectorBitCount.WithSuperword.longBitCount | 1024 | 417.458 | 413.859 | 0.862122657 | 417.203 | 214.949 | 48.4785584 | 105.954 | 117.019 | -10.4432112
Please review and share your feedback.
Best Regards,
Jatin
-------------
Commit messages:
- 8281375: Accelerate bitCount operation for AVX2 and AVX512 target.
Changes: https://git.openjdk.java.net/jdk/pull/7373/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7373&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8281375
Stats: 283 lines in 16 files changed: 272 ins; 3 del; 8 mod
Patch: https://git.openjdk.java.net/jdk/pull/7373.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/7373/head:pull/7373
PR: https://git.openjdk.java.net/jdk/pull/7373
More information about the hotspot-compiler-dev
mailing list