RFR: 8277793: Support vector F2I and D2L cast operations for X86 [v2]
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Wed Dec 1 17:23:24 UTC 2021
On Wed, 1 Dec 2021 11:31:10 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4077:
>>
>>> 4075: Label done;
>>> 4076: evcvttpd2qq(dst, src, vec_enc);
>>> 4077: evmovdqul(xtmp1, k0, double_sign_flip, true, vec_enc, scratch);
>>
>> merge masking should be false here.
>
> K0 register will enable all the lanes hence true/false value will not change the semantics.
In vector_castF2I_evex, we are using false and here we are using true for similar usage, Consistency will be good.
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4088:
>>
>>> 4086: kxorwl(ktmp1, ktmp1, ktmp2);
>>> 4087: evcmppd(ktmp1, ktmp1, src, xtmp2, Assembler::NLT_US, vec_enc);
>>> 4088: vpternlogq(xtmp2, 0x11, xtmp1, xtmp1, vec_enc);
>>
>> Consider moving the vpternlog instruction earlier after line 4082 using xtmp1 as the destination.
>> vptenlogq(xtmp1, 0x01, xtmp2, xtmp2, vec_enc);
>> Then xtmp1 can be used in the following evmovdquq.
>>
>> This will help to absorb the latency of vpternlogq.
>
> evcmppd and vpternlog should be issued in parallel to exaction ports given that there is no dependency between them, given that succeeding instruction has data dependency on both these instructions it can be issued only once both its operands are ready. Since evcmppd has higher latency so it will mask the latency of vpternlog.
sounds good.
-------------
PR: https://git.openjdk.java.net/jdk/pull/6544
More information about the hotspot-compiler-dev
mailing list