RFR: 8281375: Accelerate bitCount operation for AVX2 and AVX512 target.
Eric Liu
eliu at openjdk.java.net
Wed Feb 9 08:12:04 UTC 2022
On Mon, 7 Feb 2022 20:08:35 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> Summary of changes:
>
> - Patch extends existing vectorized bitCount optimization added with [JDK-8278868](https://bugs.openjdk.java.net/browse/JDK-8278868) and emits optimized JIT sequence for AVX2 and other AVX512 targets which do not support avx512_vpopcntdq feature.
> - Since PopCountVI/PopCountVL node emit different instruction sequence based on the target features hence a rudimentary cost mode has been added which influences the SLP unrolling factor to prevent generating bloated main loops.
> - Following are the performance results of an existing [JMH micro](https://github.com/jatin-bhateja/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/VectorBitCount.java) over various X86 targets.
>
>
> Benchmark | SIZE | Baseline AVX2 (ns/op) | Withopt AVX2 (ns/op) | Gain % | Baseline AVX3 (ns/op) | Withopt AVX3 (ns/op) | Gain % | Baseline AVX3 (VPOPCOUNTDQ) | Withopt AVX3 (VPOCOUNTDQ) | Gain %
> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> VectorBitCount.WithSuperword.intBitCount | 1024 | 1089.799 | 420.156 | 159.3796114 | 1083.92 | 203.958 | 431.442748 | 88.958 | 60.096 | 48.02649095
> VectorBitCount.WithSuperword.longBitCount | 1024 | 417.458 | 413.859 | 0.869619846 | 417.203 | 214.949 | 94.09394787 | 105.954 | 117.019 | -9.455729411
>
> Please review and share your feedback.
>
> Best Regards,
> Jatin
src/hotspot/cpu/x86/matcher_x86.hpp line 194:
> 192: }
> 193: }
> 194:
Could you explain more about the meaning of "cost" here? Should it be the actual count of instruction or latency?
src/hotspot/share/opto/loopTransform.cpp line 975:
> 973: case Op_PopCountVI:
> 974: case Op_PopCountVL: {
> 975: const TypeVect * vt = n->bottom_type()->is_vect();
style issue: const TypeVect* vt
src/hotspot/share/opto/loopTransform.cpp line 976:
> 974: case Op_PopCountVL: {
> 975: const TypeVect * vt = n->bottom_type()->is_vect();
> 976: body_size += Matcher::vector_op_cost(n->Opcode(), vt->element_basic_type(), vt->length());
For other platforms, I think `vector_op_cost` should be better to return 0 at this moment.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7373
More information about the hotspot-compiler-dev
mailing list