RFR: 8281375: Accelerate bitCount operation for AVX2 and AVX512 target.
Nils Eliasson
neliasso at openjdk.java.net
Tue Feb 8 09:05:11 UTC 2022
On Mon, 7 Feb 2022 20:08:35 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> Summary of changes:
>
> - Patch extends existing vectorized bitCount optimization added with [JDK-8278868](https://bugs.openjdk.java.net/browse/JDK-8278868) and emits optimized JIT sequence for AVX2 and other AVX512 targets which do not support avx512_vpopcntdq feature.
> - Since PopCountVI/PopCountVL node emit different instruction sequence based on the target features hence a rudimentary cost mode has been added which influences the SLP unrolling factor to prevent generating bloated main loops.
> - Following are the performance results of an existing [JMH micro](https://github.com/jatin-bhateja/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/VectorBitCount.java) over various X86 targets.
>
>
> Benchmark | SIZE | Baseline AVX2 (ns/op) | Withopt AVX2 (ns/op) | Gain % | Baseline AVX3 (ns/op) | Withopt AVX3 (ns/op) | Gain % | Baseline AVX3 (VPOPCOUNTDQ) | Withopt AVX3 (VPOCOUNTDQ) | Gain %
> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> VectorBitCount.WithSuperword.intBitCount | 1024 | 1089.799 | 420.156 | 159.3796114 | 1083.92 | 203.958 | 431.442748 | 88.958 | 60.096 | 48.02649095
> VectorBitCount.WithSuperword.longBitCount | 1024 | 417.458 | 413.859 | 0.869619846 | 417.203 | 214.949 | 94.09394787 | 105.954 | 117.019 | -9.455729411
>
> Please review and share your feedback.
>
> Best Regards,
> Jatin
Hi Jatin, nice work!
One quick comment - the _vector_popcount_lut it's generated unconditionally - could you guard it so that it's only generated when it can be used? My preferred choice would to be have it be generated lazily, but that's an enhancement all of it's own.
Best regards,
Nils
-------------
PR: https://git.openjdk.java.net/jdk/pull/7373
More information about the hotspot-compiler-dev
mailing list