RFR: 8282711: Accelerate Math.signum function for AVX and AVX512 target. [v2]

Quan Anh Mai duke at openjdk.java.net
Tue Mar 8 14:09:05 UTC 2022


On Tue, 8 Mar 2022 13:11:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4313:
>> 
>>> 4311:     evmovdquq(dst, k0, one, true, vec_enc);
>>> 4312:     evmovdquq(dst, ktmp2, xtmp1, true, vec_enc);
>>> 4313:     evblendmpd(dst, ktmp1, dst, src, true, vec_enc);
>> 
>> I believe you could achieve better register management using the following sequence.
>> 
>>     evfpclasspd(ktmp1, src, 0x7, vec_enc);
>>     evblendmpd(dst, ktmp1, dst, zero, true, vec_enc);
>>     evfpclasspd(ktmp1, src, 0x40, vec_enc);
>>     evsubpd(dst, ktmp1, zero, one, true, vec_enc);
>
>> I believe you could achieve better register management using the following sequence.
>> 
>> ```
>> evfpclasspd(ktmp1, src, 0x7, vec_enc);
>> evblendmpd(dst, ktmp1, dst, zero, true, vec_enc);
> 
> 0x7 encodes QNaN, +/- 0.0 values. Thus blending dst with zero will not work for NaN. I guess you wanted to mention src as in original sequence.
> 
>> evfpclasspd(ktmp1, src, 0x40, vec_enc);
> 
> 0x40 checks does not check for NEGATIVE_INFINITE, Math.signum should return -1 for it.  0x40 should be 0x50
> 
>> evsubpd(dst, ktmp1, zero, one, true, vec_enc);
>> ```
> 
> But I agree we can do away with some temporaries.

Ah my bad, the second instruction should be `evblendmpd(dst, ktmp1, one, zero, true, vec_enc);`
And the third one should be 0x50 as you mentioned.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7717


More information about the hotspot-compiler-dev mailing list