RFR: 8277793: Support vector F2I and D2L cast operations for X86 [v2]

Jatin Bhateja jbhateja at openjdk.java.net
Wed Dec 1 11:36:01 UTC 2021


On Tue, 30 Nov 2021 21:22:44 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   8277793: Further optimizing instruction sequence.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4077:
> 
>> 4075:   Label done;
>> 4076:   evcvttpd2qq(dst, src, vec_enc);
>> 4077:   evmovdqul(xtmp1, k0, double_sign_flip, true, vec_enc, scratch);
> 
> merge masking should be false here.

K0 register will enable all the lanes hence true/false value will not change the semantics.

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4088:
> 
>> 4086:   kxorwl(ktmp1, ktmp1, ktmp2);
>> 4087:   evcmppd(ktmp1, ktmp1, src, xtmp2, Assembler::NLT_US, vec_enc);
>> 4088:   vpternlogq(xtmp2, 0x11, xtmp1, xtmp1, vec_enc);
> 
> Consider moving the vpternlog instruction earlier after line 4082 using xtmp1 as the destination.
> vptenlogq(xtmp1, 0x01, xtmp2, xtmp2, vec_enc);
> Then xtmp1 can be used in the following evmovdquq.
> 
> This will help to absorb the latency of vpternlogq.

evcmppd and vpternlog should be issued in parallel to exaction ports given that there is no dependency between them, given that succeeding instruction has data dependency on both these instructions it can be issued only once both its operands are ready. Since evcmppd has higher latency so it will mask the latency of vpternlog.

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4112:
> 
>> 4110: 
>> 4111:   vpcmpeqd(xtmp4, xtmp4, xtmp4, vec_enc);
>> 4112:   vpxor(xtmp1, xtmp1, xtmp4, vec_enc);
> 
> vpcmpeqd is a high latency instruction. This constant (0x7FFF...) can be formed earlier immediately after 4099, when xtmp1 becomes available.

This is on a slow path which handles special values, moving it prior to 4099 will penalize fast path.

-------------

PR: https://git.openjdk.java.net/jdk/pull/6544


More information about the hotspot-compiler-dev mailing list