RFR: 8282711: Accelerate Math.signum function for AVX and AVX512 target. [v2]
Quan Anh Mai
duke at openjdk.java.net
Tue Mar 8 14:09:05 UTC 2022
On Tue, 8 Mar 2022 13:11:26 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4313:
>>
>>> 4311: evmovdquq(dst, k0, one, true, vec_enc);
>>> 4312: evmovdquq(dst, ktmp2, xtmp1, true, vec_enc);
>>> 4313: evblendmpd(dst, ktmp1, dst, src, true, vec_enc);
>>
>> I believe you could achieve better register management using the following sequence.
>>
>> evfpclasspd(ktmp1, src, 0x7, vec_enc);
>> evblendmpd(dst, ktmp1, dst, zero, true, vec_enc);
>> evfpclasspd(ktmp1, src, 0x40, vec_enc);
>> evsubpd(dst, ktmp1, zero, one, true, vec_enc);
>
>> I believe you could achieve better register management using the following sequence.
>>
>> ```
>> evfpclasspd(ktmp1, src, 0x7, vec_enc);
>> evblendmpd(dst, ktmp1, dst, zero, true, vec_enc);
>
> 0x7 encodes QNaN, +/- 0.0 values. Thus blending dst with zero will not work for NaN. I guess you wanted to mention src as in original sequence.
>
>> evfpclasspd(ktmp1, src, 0x40, vec_enc);
>
> 0x40 checks does not check for NEGATIVE_INFINITE, Math.signum should return -1 for it. 0x40 should be 0x50
>
>> evsubpd(dst, ktmp1, zero, one, true, vec_enc);
>> ```
>
> But I agree we can do away with some temporaries.
Ah my bad, the second instruction should be `evblendmpd(dst, ktmp1, one, zero, true, vec_enc);`
And the third one should be 0x50 as you mentioned.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7717
More information about the hotspot-compiler-dev
mailing list