RFR: 8282711: Accelerate Math.signum function for AVX and AVX512 target. [v5]
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Wed Apr 13 01:01:20 UTC 2022
On Fri, 1 Apr 2022 07:51:11 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> - Patch auto-vectorizes Math.signum operation for floating point types.
>> - Efficient JIT sequence is being generated for AVX512 and legacy X86 targets.
>> - Following is the performance data for include JMH micro.
>>
>> System : Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (40C 2S Icelake Server)
>>
>> Benchmark | (SIZE) | Baseline AVX (ns/op) | Withopt AVX (ns/op) | Gain Ratio | Basline AVX512 (ns/op) | Withopt AVX512 (ns/op) | Gain Ratio
>> -- | -- | -- | -- | -- | -- | -- | --
>> VectorSignum.doubleSignum | 256 | 174.357 | 68.374 | 2.550048264 | 173.679 | 31.013 | 5.600199916
>> VectorSignum.doubleSignum | 512 | 334.231 | 128.762 | 2.595727 | 334.625 | 59.377 | 5.635599643
>> VectorSignum.doubleSignum | 1024 | 655.679 | 251.566 | 2.606389576 | 655.267 | 116.736 | 5.613238418
>> VectorSignum.doubleSignum | 2048 | 1292.165 | 499.924 | 2.584722878 | 1301.7 | 228.064 | 5.707608391
>> VectorSignum.floatSignum | 256 | 176.064 | 39.864 | 4.416616496 | 174.639 | 25.372 | 6.883138893
>> VectorSignum.floatSignum | 512 | 337.565 | 71.027 | 4.752629282 | 331.506 | 36.64 | 9.047652838
>> VectorSignum.floatSignum | 1024 | 661.488 | 131.074 | 5.046675924 | 644.621 | 63.88 | 10.09112398
>> VectorSignum.floatSignum | 2048 | 1299.685 | 253.271 | 5.13159817 | 1279.658 | 118.995 | 10.75388042
>>
>>
>> Kindly review and share feedback.
>>
>> Best Regards,
>> Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
>
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8282711
> - 8282711: Replacing vector length based predicate.
> - 8282711: Making the changes more generic (removing AVX512DQ restriction), adding new IR level test.
> - 8282711: Review comments resolved.
> - 8282711: Accelerate Math.signum function for AVX and AVX512 target.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4382:
> 4380: vblendvps(dst, one, dst, src, vec_enc);
> 4381: vcmpps(xtmp1, src, zero, Assembler::EQ_UQ, vec_enc);
> 4382: vblendvps(dst, dst, src, xtmp1, vec_enc);
Some comments describing what we are trying to do here would be good.
src/hotspot/cpu/x86/x86.ad line 6094:
> 6092: %}
> 6093:
> 6094: instruct signumV_reg_avx(vec dst, vec src, vec zero, vec one, vec xtmp1, vec xtmp2, rFlagsReg cr) %{
xtmp2 is not being used and could be removed from here.
Also which instruction is modifying rFlags?
src/hotspot/cpu/x86/x86.ad line 6099:
> 6097: match(Set dst (SignumVD src (Binary zero one)));
> 6098: effect(TEMP dst, TEMP xtmp1, TEMP xtmp2, KILL cr);
> 6099: format %{ "vector_signum_avx $dst, $src\t! using $xtmp1, and $xtmp2 as TEMP" %}
Need to show zero and one register as well here.
src/hotspot/cpu/x86/x86.ad line 6109:
> 6107: %}
> 6108:
> 6109: instruct signumV_reg_evex(vec dst, vec src, vec zero, vec one, kReg ktmp1, rFlagsReg cr) %{
Which instruction is modifying rFlags? If none, it could be removed from here.
src/hotspot/cpu/x86/x86.ad line 6114:
> 6112: match(Set dst (SignumVD src (Binary zero one)));
> 6113: effect(TEMP dst, TEMP ktmp1, KILL cr);
> 6114: format %{ "vector_signum_evex $dst, $src\t! using $ktmp1 as TEMP" %}
Need to show zero and one register as well here.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7717
More information about the hotspot-compiler-dev
mailing list