RFR: 8349138: Optimize Math.copySign API for Intel e-core targets [v4]
Sandhya Viswanathan
sviswanathan at openjdk.org
Tue May 20 23:30:59 UTC 2025
On Tue, 6 May 2025 17:28:48 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Math.copySign is only intrinsified on x86 targets supporting the AVX512 feature.
>> Intel E-core Xeons support only the AVX2 feature set and still compile Java implementation which is composed of logical operations.
>>
>> Since there is a 3-cycle penalty for copying incoming float/double values to GPRs before being operated upon by logical operation there is an opportunity to optimize this using an efficient instruction sequence.
>>
>> Patch uses ANDPS and ANDPD logical instruction to generate efficient instruction sequences to absorb domain copy over penalty. Also, performs minor tuning for existing AVX512 instruction sequence based on VPTERNLOG instruction.
>>
>> Following are the performance numbers of the following existing microbenchmark
>> https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/Signum.java
>>
>> Patch passes following validation test
>> [test/jdk/java/lang/Math/IeeeRecommendedTests.java
>> ](https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/Math/IeeeRecommendedTests.java)
>>
>>
>> Granite Rapids-AP (P-core Xeon)
>> Baseline AVX512:
>> Benchmark Mode Cnt Score Error Units
>> Signum._5_copySignFloatTest thrpt 2 1296.141 ops/ns
>> Signum._7_copySignDoubleTest thrpt 2 838.954 ops/ns
>>
>> Withopt :
>> Benchmark Mode Cnt Score Error Units
>> Signum._5_copySignFloatTest thrpt 2 940.240 ops/ns
>> Signum._7_copySignDoubleTest thrpt 2 967.370 ops/ns
>>
>> Baseline AVX2:
>> Benchmark Mode Cnt Score Error Units
>> Signum._5_copySignFloatTest thrpt 2 63.673 ops/ns
>> Signum._7_copySignDoubleTest thrpt 2 26.898 ops/ns
>>
>> Withopt :
>> Benchmark Mode Cnt Score Error Units
>> Signum._5_copySignFloatTest thrpt 2 785.801 ops/ns
>> Signum._7_copySignDoubleTest thrpt 2 558.710 ops/ns
>>
>> Sierra Forest (E-core Xeon)
>> Baseline:
>> Benchmark (seed) Mode Cnt Score Error Units
>> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 40.528 ops/ns
>> o.o.b.vm.compiler.Signum._7_copySignDoubleTest N/A thrpt 2 25.101 ops/ns
>>
>> Withopt:
>> Benchmark (seed) Mode Cnt Score Error Units
>> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 676....
>
> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits:
>
> - Review comments resolutions
> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8349138
> - Adding vector support along with some refactoring.
> - Adding IR framework verification test
> - 8349138: Optimize Math.copySign API for Intel e-core and p-core targets
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7157:
> 7155: vpsllw(dst, src, shift, vlen_enc);
> 7156: } else if (elem_sz == 4) {
> 7157: vpslld(dst, src, shift, vlen_enc);
AVX 1 supports 256-bit float/double vector and only128-bit vpsll, vpsrl, vpor for integer vectors. So you will have issues on AVX 1 platform for 256bit float/double vector copysign implementation using vpsll, vpsrl, vpor.
src/hotspot/cpu/x86/x86.ad line 6525:
> 6523: %}
> 6524:
> 6525: #ifdef _LP64
_LP64 ifdef no more needed in .ad file (32 bit support has been removed).
src/hotspot/cpu/x86/x86.ad line 6551:
> 6549: #endif // _LP64
> 6550:
> 6551: instruct copySignF_reg_avx(regF dst, regF src, regF xtmp) %{
These should be vlRegF.
src/hotspot/cpu/x86/x86.ad line 6562:
> 6560: %}
> 6561:
> 6562: instruct copySignD_imm_avx(regD dst, regD src, regD xtmp, immD zero) %{
These should be vlRegD.
src/hotspot/cpu/x86/x86.ad line 6577:
> 6575: match(Set dst (CopySignVF dst src));
> 6576: match(Set dst (CopySignVD dst src));
> 6577: effect(TEMP xtmp);
vector_copy_sign_avx needs TEMP dst so may need two different instruct rules.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2098998098
PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2099009546
PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2098976333
PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2098977882
PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2098980097
More information about the hotspot-compiler-dev
mailing list