RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v2]

Sandhya Viswanathan sviswanathan at openjdk.org
Thu Jun 1 01:10:16 UTC 2023


On Tue, 30 May 2023 23:31:09 GMT, Scott Gibbons <sgibbons at openjdk.org> wrote:

>> Add an intrinsic for x86 AVX and AVX512 fmod.  This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem).  Also addresses dmod / drem.
>> 
>> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191).
>> 
>> Old:
>> gcc-12.2.1-4.fc36.x86_64
>> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix
>> JVM version: 21-internal
>> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68
>> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67
>> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70
>> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69
>> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44
>> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40
>> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40
>> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40
>> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41
>> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40
>> New:
>> JVM version: 21-internal  (float)
>> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42
>> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27
>> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17
>> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16
>> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17
>> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17
>> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17
>> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16
>> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16
>> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17
>
> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address review comments

src/hotspot/cpu/x86/assembler_x86.cpp line 3559:

> 3557: 
> 3558: void Assembler::vmovsd(XMMRegister dst, XMMRegister src, XMMRegister src2) {
> 3559:   assert(UseAVX > 0, "Requires some form ov AVX");

Typo "Requires some form **of** AVX"

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 125:

> 123:   __ vmovsd(xmm5, xmm18, xmm20);
> 124:   __ movq(xmm17, rax);
> 125:   __ vandpd(xmm0, xmm5, xmm17, Assembler::AVX_512bit);

This and others below all should be Assembler::AVX_128bit. No need for AVX_512bit here.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 134:

> 132: 
> 133: //   q = DP_DIV_RZ(a, b);
> 134:   __ vmovsd(xmm5, xmm18, xmm1);

This and other usage of vmovsd with blending two registers could be avoided.

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 164:

> 162:   __ mov64(rax, 0x7FEFFFFFFFFFFFFF);
> 163:   __ movq(Address(rsp, 0x20), rax);
> 164:   __ movsd(xmm2, Address(rsp, 0x20));

You could directly do:
__ movsd(xmm2, ExternalAddress((address)CONST_MAX), rax);

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 171:

> 169:   __ mov64(rax, 0x7FF0000000000000);
> 170:   __ movq(Address(rsp, 0x20), rax);
> 171:   __ movsd(xmm2, Address(rsp, 0x20));

You could directly do:
 __ movsd(xmm2, ExternalAddress((address)CONST_INF), rax);

src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 179:

> 177:   __ mov64(rax, 0x7FE0000000000000);
> 178:   __ movq(Address(rsp, 0x20), rax);
> 179:   __ movsd(xmm21, Address(rsp, 0x20));

You could directly do:
 __ movsd(xmm2, ExternalAddress((address)CONST_e307), rax);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1210963286
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212455393
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212458387
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212457127
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212457314
PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212457651


More information about the hotspot-dev mailing list