RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v6]

Jatin Bhateja jbhateja at openjdk.org
Sun Jun 4 15:36:21 UTC 2023


On Fri, 2 Jun 2023 22:52:48 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> Add an intrinsic for x86 AVX and AVX512 fmod.  This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem).  Also addresses dmod / drem.
>> 
>> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191).
>> 
>> Old:
>> gcc-12.2.1-4.fc36.x86_64
>> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix
>> JVM version: 21-internal
>> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68
>> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67
>> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70
>> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69
>> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44
>> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40
>> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40
>> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40
>> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41
>> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40
>> New:
>> JVM version: 21-internal  (float)
>> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42
>> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27
>> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17
>> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16
>> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17
>> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17
>> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17
>> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16
>> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16
>> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Indentation; spread source into assembly

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3941:

> 3939:   generate_libm_stubs();
> 3940: 
> 3941:   if ((UseAVX >= 1) && (VM_Version::supports_avx512vlbwdq() || VM_Version::supports_fma())) {

We can relax this to supports_evex instead of vlbwdq.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 79:

> 77:   __ enter(); // required for proper stackwalking of RuntimeStub frame
> 78: 
> 79:   if (VM_Version::supports_avx512vlbwdq()) {     // AVX512 version

We can relax this to supports_evex().

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 121:

> 119:     //     // |x|, |y|
> 120:     //     a = DP_AND(x, DP_CONST(7fffffffffffffff));
> 121:     __ movq(xmm0, xmm0);

Redundatn move.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 122:

> 120:     //     a = DP_AND(x, DP_CONST(7fffffffffffffff));
> 121:     __ movq(xmm0, xmm0);
> 122:     __ mov64(rax, 0x7FFFFFFFFFFFFFFF);

ULL suffice missing long constant.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 123:

> 121:     __ movq(xmm0, xmm0);
> 122:     __ mov64(rax, 0x7FFFFFFFFFFFFFFF);
> 123:     __ evpbroadcastq(xmm3, rax, Assembler::AVX_128bit);

Replace broadcast with cheaper PINSRQ.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 134:

> 132:     __ evdivsd(xmm0, xmm6, xmm5, Assembler::EVEX_RZ);
> 133:     //     q = DP_ROUND_RZ(q);
> 134:     __ movq(xmm0, xmm0);

Redundant movq

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 145:

> 143:     __ jcc(Assembler::equal, L_5280);
> 144:     //     if (eq >= 0x7fefffffu) goto SPECIAL_FMOD;
> 145:     __ cmpl(rax, 0x7feffffe);

Comment mention comparison against 0x7feffff.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 160:

> 158:     __ jcc(Assembler::below, L_5300);
> 159:     __ movsd(xmm0, ExternalAddress((address)CONST_INF), rax);
> 160:     //         return DP_FNMA(b, q, a);    // NaN

Misplaced comment for NaN already present at 204

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 168:

> 166:     __ jmp(L_exit);
> 167:     //     if (!eq)  return x + sgn_a;
> 168:     __ align32();

Redundant alignment ?

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 192:

> 190:     __ evdivsd(xmm2, xmm6, xmm5, Assembler::EVEX_RZ);
> 191:     //         q = DP_ROUND_RZ(q);
> 192:     __ movq(xmm2, xmm2);

Redundant move.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 215:

> 213:     __ evdivsd(xmm0, xmm6, xmm2, Assembler::EVEX_RZ);
> 214:     //     q = DP_ROUND_RZ(q);
> 215:     __ movq(xmm0, xmm0);

Redundant move.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 264:

> 262:     __ evdivsd(xmm0, xmm7, xmm2, Assembler::EVEX_RZ);
> 263:     //         q = DP_ROUND_RZ(q);
> 264:     __ movq(xmm0, xmm0);

Redundant move.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 305:

> 303:     //   // sign(x)
> 304:     //   sgn_a = DP_XOR(x, a);
> 305:     __ mov64(rcx, 0x8000000000000000);

ULL suffice in long constant.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216815105
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216810945
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216613005
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216612764
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216616520
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216617108
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216622400
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216733715
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216736764
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216756042
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216767150
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216779440
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216630301


More information about the hotspot-dev mailing list