RFR: 8282711: Accelerate Math.signum function for AVX and AVX512 target. [v2]
Jatin Bhateja
jbhateja at openjdk.java.net
Tue Mar 8 13:15:47 UTC 2022
On Tue, 8 Mar 2022 06:31:01 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:
> I believe you could achieve better register management using the following sequence.
>
> ```
> evfpclasspd(ktmp1, src, 0x7, vec_enc);
> evblendmpd(dst, ktmp1, dst, zero, true, vec_enc);
0x7 encodes QNaN, +/- 0.0 values. Thus blending dst with zero will not work for NaN. I guess you wanted to mention src as in original sequence.
> evfpclasspd(ktmp1, src, 0x40, vec_enc);
0x40 checks does not check for NEGATIVE_INFINITE, Math.signum should return -1 for it. 0x40 should be 0x50
> evsubpd(dst, ktmp1, zero, one, true, vec_enc);
> ```
But I agree we can do away with some temporaries.
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4334:
>
>> 4332: vsubpd(xtmp3, zero, one, vec_enc);
>> 4333: vblendvpd(dst, xtmp3, one, xtmp2, vec_enc);
>> 4334: vblendvpd(dst, dst, src, xtmp1, vec_enc);
>
> The same applies here, suggestion:
>
> vsubpd(dst, zero, one, vec_enc);
> vblendvpd(dst, src, one, dst, vec_enc);
> vcmppd(xtmp1, src, zero, Assembler::EQ_UQ, vec_enc);
> vblendvpd(dst, xtmp1, dst, zero, vec_enc);
>
> Thanks.
Same responses as above.
> src/hotspot/cpu/x86/x86.ad line 1868:
>
>> 1866: if (UseAVX < 1 ||
>> 1867: (size_in_bits == 512 && !VM_Version::supports_avx512dq()) ||
>> 1868: (size_in_bits == 256 && !VM_Version::supports_avx2())) {
>
> May I ask why do we need avx2 for 256-bit, thanks.
Current instruction uses VPOR operations which works over 256 bit integral vectors only for targets supporting AVX2.
VPOR is needed to compute a combined mask for NaN , -0.0 or 0.0. In all these case original source lanes should be returned back.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7717
More information about the hotspot-compiler-dev
mailing list