RFR: 8282711: Accelerate Math.signum function for AVX and AVX512 target. [v2]

Jatin Bhateja jbhateja at openjdk.java.net
Tue Mar 8 13:15:47 UTC 2022


On Tue, 8 Mar 2022 06:31:01 GMT, Quan Anh Mai <duke at openjdk.java.net> wrote:

> I believe you could achieve better register management using the following sequence.
> 
> ```
> evfpclasspd(ktmp1, src, 0x7, vec_enc);
> evblendmpd(dst, ktmp1, dst, zero, true, vec_enc);

0x7 encodes QNaN, +/- 0.0 values. Thus blending dst with zero will not work for NaN. I guess you wanted to mention src as in original sequence.

> evfpclasspd(ktmp1, src, 0x40, vec_enc);

0x40 checks does not check for NEGATIVE_INFINITE, Math.signum should return -1 for it.  0x40 should be 0x50

> evsubpd(dst, ktmp1, zero, one, true, vec_enc);
> ```

But I agree we can do away with some temporaries.

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4334:
> 
>> 4332:     vsubpd(xtmp3, zero, one, vec_enc);
>> 4333:     vblendvpd(dst, xtmp3, one, xtmp2, vec_enc);
>> 4334:     vblendvpd(dst, dst, src, xtmp1, vec_enc);
> 
> The same applies here, suggestion:
> 
>     vsubpd(dst, zero, one, vec_enc);
>     vblendvpd(dst, src, one, dst, vec_enc);
>     vcmppd(xtmp1, src, zero, Assembler::EQ_UQ, vec_enc);
>     vblendvpd(dst, xtmp1, dst, zero, vec_enc);
> 
> Thanks.

Same responses as above.

> src/hotspot/cpu/x86/x86.ad line 1868:
> 
>> 1866:       if (UseAVX < 1 ||
>> 1867:           (size_in_bits == 512 && !VM_Version::supports_avx512dq()) ||
>> 1868:           (size_in_bits == 256 && !VM_Version::supports_avx2())) {
> 
> May I ask why do we need avx2 for 256-bit, thanks.

Current instruction uses VPOR operations which works over 256 bit integral vectors only for targets supporting AVX2. 
VPOR is needed to compute a combined mask for NaN , -0.0 or 0.0. In all these case original source lanes should be returned back.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7717


More information about the hotspot-compiler-dev mailing list