RFR: 8320794: Emulate rest of vblendvp[sd] on ECore [v2]
Volodymyr Paprotski
duke at openjdk.org
Tue Apr 2 16:28:02 UTC 2024
On Thu, 28 Mar 2024 00:04:49 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Fix double pasted test
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4871:
>
>> 4869: vpxor(xtmp3, xtmp2, xtmp4, vec_enc);
>> 4870:
>> 4871: vblendvps(dst, dst, xtmp1, xtmp3, vec_enc, true, xtmp4);
>
> The vblendvps at line 4861 could also be emulated:
> From:
> vpxor(xtmp4, xtmp4, xtmp4, vec_enc);
> vcmpps(xtmp3, src, src, Assembler::UNORD_Q, vec_enc);
> vblendvps(dst, dst, xtmp4, xtmp3, vec_enc);
>
> To:
> vpxor(xtmp4, xtmp4, xtmp4, vec_enc);
> vcmpps(xtmp3, src, src, Assembler::UNORD_Q, vec_enc);
> vblendvps(dst, dst, xtmp4, xtmp3, vec_enc, false, xtmp4);
Cannot use `xtmp4`, scratch isn't available, (`scratch != src2`):
bool scratch_available = scratch != xnoreg && scratch != src1 && scratch != src2 && scratch != dst;
(Originally that's exactly what I had, but got caught when I intentionally made the fall-back/default case `assert`)
> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3524:
>
>> 3522: bool blend_emulation = EnableX86ECoreOpts && UseAVX > 1;
>> 3523: bool scratch_available = scratch != xnoreg && scratch != src1 && scratch != src2 && scratch != dst;
>> 3524: bool dst_available = (dst != mask || compute_mask) && (dst != src1 || dst != src2);
>
> There are two paths here:
> Path 1: When compute_mask == true
> scratch_available = (scratch != xnoreg) && (scratch != src1) && (scratch != src2) && (scratch != dst);
> dst_available = (dst != mask) && (dst != src1 || dst != src2);
> Path 2: When compute_mask == false
> scratch_available = (scratch != xnoreg) && (scratch != dst);
> dst_available = (dst != mask) && (dst != src1 || dst != src2);
I had thought of using `scratch_available` instead of `compute_mask` (i.e. `...dst_available = (dst != mask || scratch_available)...` but I thought it would make more sense to use `compute_mask`. i.e. dst is available to modify on 3531 because of the branch on `compute_mask` on 3526
Also I prefer the shape of condition on 3525; three "simpler" independent boolean conditions instead of two longer ones. (i.e. Making `dst_available` depend on directly on `scratch_available` would make the it two "complex" conditions instead of three "simpler" conditions.) Though simplicity of the condition is debatable, I made it as readable as I could.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18310#discussion_r1548177012
PR Review Comment: https://git.openjdk.org/jdk/pull/18310#discussion_r1548194113
More information about the hotspot-dev
mailing list