RFR: 8282711: Accelerate Math.signum function for AVX and AVX512 target. [v5]

Wed Apr 13 01:01:20 UTC 2022

On Fri, 1 Apr 2022 07:51:11 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> - Patch auto-vectorizes Math.signum operation for floating point types.
>> - Efficient JIT sequence is being generated for AVX512 and legacy X86 targets.
>> - Following is the performance data for include JMH micro.
>> 
>> System : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz  (40C 2S Icelake Server) 
>> 
>> Benchmark | (SIZE) | Baseline AVX (ns/op) | Withopt AVX (ns/op) | Gain Ratio | Basline AVX512 (ns/op) | Withopt AVX512 (ns/op) | Gain Ratio
>> -- | -- | -- | -- | -- | -- | -- | --
>> VectorSignum.doubleSignum | 256 | 174.357 | 68.374 | 2.550048264 | 173.679 | 31.013 | 5.600199916
>> VectorSignum.doubleSignum | 512 | 334.231 | 128.762 | 2.595727 | 334.625 | 59.377 | 5.635599643
>> VectorSignum.doubleSignum | 1024 | 655.679 | 251.566 | 2.606389576 | 655.267 | 116.736 | 5.613238418
>> VectorSignum.doubleSignum | 2048 | 1292.165 | 499.924 | 2.584722878 | 1301.7 | 228.064 | 5.707608391
>> VectorSignum.floatSignum | 256 | 176.064 | 39.864 | 4.416616496 | 174.639 | 25.372 | 6.883138893
>> VectorSignum.floatSignum | 512 | 337.565 | 71.027 | 4.752629282 | 331.506 | 36.64 | 9.047652838
>> VectorSignum.floatSignum | 1024 | 661.488 | 131.074 | 5.046675924 | 644.621 | 63.88 | 10.09112398
>> VectorSignum.floatSignum | 2048 | 1299.685 | 253.271 | 5.13159817 | 1279.658 | 118.995 | 10.75388042
>> 
>> 
>> Kindly review and share feedback.
>> 
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
> 
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8282711
>  - 8282711: Replacing vector length based predicate.
>  - 8282711: Making the changes more generic (removing AVX512DQ restriction), adding new IR level test.
>  - 8282711: Review comments resolved.
>  - 8282711: Accelerate Math.signum function for AVX and AVX512 target.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4382:

> 4380:     vblendvps(dst, one, dst, src, vec_enc);
> 4381:     vcmpps(xtmp1, src, zero, Assembler::EQ_UQ, vec_enc);
> 4382:     vblendvps(dst, dst, src, xtmp1, vec_enc);

Some comments describing what we are trying to do here would be good.

src/hotspot/cpu/x86/x86.ad line 6094:

> 6092: %}
> 6093: 
> 6094: instruct signumV_reg_avx(vec dst, vec src, vec zero, vec one, vec xtmp1, vec xtmp2, rFlagsReg cr) %{

xtmp2 is not being used and could be removed from here.
Also which instruction is modifying rFlags?

src/hotspot/cpu/x86/x86.ad line 6099:

> 6097:   match(Set dst (SignumVD src (Binary zero one)));
> 6098:   effect(TEMP dst, TEMP xtmp1, TEMP xtmp2, KILL cr);
> 6099:   format %{ "vector_signum_avx $dst, $src\t! using $xtmp1, and $xtmp2 as TEMP" %}

Need to show zero and one register as well here.

src/hotspot/cpu/x86/x86.ad line 6109:

> 6107: %}
> 6108: 
> 6109: instruct signumV_reg_evex(vec dst, vec src, vec zero, vec one, kReg ktmp1, rFlagsReg cr) %{

Which instruction is modifying rFlags? If none, it could be removed from here.

src/hotspot/cpu/x86/x86.ad line 6114:

> 6112:   match(Set dst (SignumVD src (Binary zero one)));
> 6113:   effect(TEMP dst, TEMP ktmp1, KILL cr);
> 6114:   format %{ "vector_signum_evex $dst, $src\t! using $ktmp1 as TEMP" %}

Need to show zero and one register as well here.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7717