RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v2]

Jatin Bhateja jbhateja at openjdk.org
Thu Jun 1 11:43:12 UTC 2023


On Tue, 30 May 2023 23:31:09 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> Add an intrinsic for x86 AVX and AVX512 fmod.  This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem).  Also addresses dmod / drem.
>> 
>> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191).
>> 
>> Old:
>> gcc-12.2.1-4.fc36.x86_64
>> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix
>> JVM version: 21-internal
>> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68
>> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67
>> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70
>> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69
>> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44
>> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40
>> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40
>> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40
>> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41
>> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40
>> New:
>> JVM version: 21-internal  (float)
>> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42
>> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27
>> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17
>> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16
>> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17
>> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17
>> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17
>> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16
>> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16
>> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address review comments

Hi @asgibbons ,
Kindly also include the results for following benchmark
test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java

Best Regards,
Jatin

Hi @asgibbons ,
Kindly also include the results for following benchmark
test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java

Best Regards,
Jatin

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 234:

> 232: //   {
> 233: //     q = DP_DIV_RZ(a, bs);
> 234:   __ bind(L_1237);

should be ok to do loop alignment padding, though may low trip count loop.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 306:

> 304: 
> 305:     Label L_104a, L_11bd, L_10c1, L_1090, L_11b9, L_10e7, L_11af, L_111c, L_10f3, L_116e, L_112a;
> 306:     Label L_1173, L_1157, L_117f, L_11a0;

For the sake of clarity, can we segregate AVX2 functionality into a separate routine and indent the block.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 321:

> 319:   __ movl(rcx, rax);
> 320:   __ orl(rcx, 0x7f80);
> 321:   __ movl(Address(rsp, 0x04), rcx);

It may rarely happen that scope of MXCSR change is beyond couple of instruction, hence we simply load the needed settings and later on re-load std MXCSR settings from default location `ldmxcsr(ExternalAddress(StubRoutines::x86::addr_mxcsr_std()));`

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 396:

> 394:   __ vdivpd(xmm0, xmm4, xmm3, Assembler::AVX_128bit);
> 395: //   q = DP_TRUNC(q);
> 396:   __ vroundsd(xmm0, xmm0, xmm0, 3);

vroundsd can be removed if we defer MXCSR reinitialization beyond it.

-------------

PR Review: https://git.openjdk.org/jdk/pull/14224#pullrequestreview-1455184476
PR Review: https://git.openjdk.org/jdk/pull/14224#pullrequestreview-1455256066
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212979736
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212982077
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212995674
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1213009324


More information about the hotspot-dev mailing list