RFR: 8349138: Optimize Math.copySign API for Intel e-core targets [v3]

Emanuel Peter epeter at openjdk.org
Wed Apr 2 11:36:59 UTC 2025


On Thu, 6 Feb 2025 17:49:54 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Math.copySign is only intrinsified on x86 targets supporting the AVX512 feature.
>> Intel E-core Xeons support only the AVX2 feature set and still compile Java implementation which is composed of logical operations.
>> 
>> Since there is a 3-cycle penalty for copying incoming float/double values to GPRs before being operated upon by logical operation there is an opportunity to optimize this using an efficient instruction sequence.
>> 
>> Patch uses ANDPS and ANDPD logical instruction to generate efficient instruction sequences to absorb domain copy over penalty. Also, performs minor tuning for existing AVX512 instruction sequence based on VPTERNLOG instruction.
>> 
>> Following are the performance numbers of the following existing microbenchmark  
>> https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/Signum.java
>> 
>> Patch passes following validation test
>> [test/jdk/java/lang/Math/IeeeRecommendedTests.java
>> ](https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/Math/IeeeRecommendedTests.java)
>> 
>> 
>> Granite Rapids-AP (P-core Xeon)
>> Baseline AVX512:
>> Benchmark                      Mode  Cnt     Score   Error   Units
>> Signum._5_copySignFloatTest   thrpt    2  1296.141          ops/ns
>> Signum._7_copySignDoubleTest  thrpt    2   838.954          ops/ns
>> 
>> Withopt :
>> Benchmark                      Mode  Cnt    Score   Error   Units
>> Signum._5_copySignFloatTest   thrpt    2  940.240          ops/ns
>> Signum._7_copySignDoubleTest  thrpt    2  967.370          ops/ns
>> 
>> Baseline AVX2:
>> Benchmark                      Mode  Cnt   Score   Error   Units
>> Signum._5_copySignFloatTest   thrpt    2  63.673          ops/ns
>> Signum._7_copySignDoubleTest  thrpt    2  26.898          ops/ns
>> 
>> Withopt :
>> Benchmark                      Mode  Cnt    Score   Error   Units
>> Signum._5_copySignFloatTest   thrpt    2  785.801          ops/ns
>> Signum._7_copySignDoubleTest  thrpt    2  558.710          ops/ns
>> 
>> Sierra Forest (E-core Xeon)
>> Baseline:
>> Benchmark                                       (seed)   Mode  Cnt        Score   Error   Units
>> o.o.b.vm.compiler.Signum._5_copySignFloatTest      N/A  thrpt    2       40.528          ops/ns
>> o.o.b.vm.compiler.Signum._7_copySignDoubleTest     N/A  thrpt    2       25.101          ops/ns
>> 
>> Withopt:
>> Benchmark                                       (seed)   Mode  Cnt        Score   Error   Units
>> o.o.b.vm.compiler.Signum._5_copySignFloatTest      N/A  thrpt    2      676....
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Adding vector support along with some refactoring.

Then non x64 specific code looks reasonable, though I have 2 comments ;)

test/hotspot/jtreg/compiler/intrinsics/math/TestCopySignIntrinsic.java line 79:

> 77:         IntStream.range(0, SIZE - 8).forEach(i -> { dmagnitude[i] = rd.nextFloat(-Float.MAX_VALUE, Float.MAX_VALUE); });
> 78:         IntStream.range(0, SIZE).forEach(i -> { fsign[i] = rd.nextFloat(-Float.MAX_VALUE, Float.MAX_VALUE); });
> 79:         IntStream.range(0, SIZE).forEach(i -> { dsign[i] = rd.nextFloat(-Float.MAX_VALUE, Float.MAX_VALUE); });

Why not use Generators.java ? That would also give you NaN, infinity, etc ;)

test/hotspot/jtreg/compiler/intrinsics/math/TestCopySignIntrinsic.java line 122:

> 120:                 }
> 121:             }
> 122:         }

Verify.checkEQ should do this for you.... though maybe you'd have to wait for https://github.com/openjdk/jdk/pull/24224 not to get into trouble with different NaN encodings.

-------------

Changes requested by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/23386#pullrequestreview-2735939246
PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2024635995
PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2024638031


More information about the hotspot-compiler-dev mailing list