RFR: 8320794: Emulate rest of vblendvp[sd] on ECore [v2]

Volodymyr Paprotski duke at openjdk.org
Tue Apr 2 16:28:02 UTC 2024


On Thu, 28 Mar 2024 00:04:49 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix double pasted test
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4871:
> 
>> 4869:   vpxor(xtmp3, xtmp2, xtmp4, vec_enc);
>> 4870: 
>> 4871:   vblendvps(dst, dst, xtmp1, xtmp3, vec_enc, true, xtmp4);
> 
> The vblendvps at line 4861 could also be emulated:
> From:
>   vpxor(xtmp4, xtmp4, xtmp4, vec_enc);
>   vcmpps(xtmp3, src, src, Assembler::UNORD_Q, vec_enc);
>   vblendvps(dst, dst, xtmp4, xtmp3, vec_enc);
> 
> To:
>   vpxor(xtmp4, xtmp4, xtmp4, vec_enc);
>   vcmpps(xtmp3, src, src, Assembler::UNORD_Q, vec_enc);
>   vblendvps(dst, dst, xtmp4, xtmp3, vec_enc, false, xtmp4);

Cannot use `xtmp4`, scratch isn't available, (`scratch != src2`):

bool scratch_available = scratch != xnoreg && scratch != src1 && scratch != src2 && scratch != dst;


(Originally that's exactly what I had, but got caught when I intentionally made the fall-back/default case `assert`)

> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3524:
> 
>> 3522:   bool blend_emulation = EnableX86ECoreOpts && UseAVX > 1;
>> 3523:   bool scratch_available = scratch != xnoreg && scratch != src1 && scratch != src2 && scratch != dst;
>> 3524:   bool dst_available = (dst != mask || compute_mask) && (dst != src1 || dst != src2);
> 
> There are two paths here:
> Path 1: When compute_mask == true
>               scratch_available = (scratch != xnoreg) && (scratch != src1) && (scratch != src2) && (scratch != dst);
>               dst_available = (dst != mask) && (dst != src1 || dst != src2);
> Path 2: When compute_mask == false
>               scratch_available = (scratch != xnoreg) && (scratch != dst);
>               dst_available = (dst != mask) && (dst != src1 || dst != src2);

I had thought of using `scratch_available` instead of `compute_mask` (i.e. `...dst_available = (dst != mask || scratch_available)...` but I thought it would make more sense to use `compute_mask`. i.e. dst is available to modify on 3531 because of the branch on `compute_mask` on 3526 

Also I prefer the shape of condition on 3525; three "simpler" independent boolean conditions instead of two longer ones. (i.e. Making `dst_available` depend on directly on `scratch_available` would make the it two "complex" conditions instead of three "simpler" conditions.) Though simplicity of the condition is debatable, I made it as readable as I could.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18310#discussion_r1548177012
PR Review Comment: https://git.openjdk.org/jdk/pull/18310#discussion_r1548194113


More information about the hotspot-dev mailing list