[vectorIntrinsics] RFR: 8265312: Unsigned comparison operators
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Wed Apr 21 16:30:43 UTC 2021
On Wed, 21 Apr 2021 06:25:42 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> Add unsigned comparison operators to VectorOperators and add intrinsic support on x64.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 2143:
>
>> 2141: vpmovzxbw(vtmp2, src2, vlen_enc);
>> 2142: vpcmpCCW(dst, vtmp1, vtmp2, comparison, Assembler::W, vlen_enc, scratch);
>> 2143: vpand(dst, dst, ExternalAddress(StubRoutines::x86::vector_short_to_byte_mask()), vlen_enc, scratch);
>
> Hi @sviswa7 , zero extension for unsigned comparison for signed numbers looks fine, post extension comparison result will be all 1s or all 0s in corresponding lanes, by doing a signed saturation packing (vpacksswb) can we not avoid extra vector AND.
Good point, using packsswb should work here. I will give it a try and update the patch accordingly.
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 2183:
>
>> 2181: vpand(vtmp3, vtmp3, ExternalAddress(StubRoutines::x86::vector_short_to_byte_mask()), vlen_enc, scratch);
>> 2182: vpackuswb(dst, dst, vtmp3, vlen_enc);
>> 2183: vpermpd(dst, dst, 0xd8, vlen_enc);
>
> since comparison is performed at lane level (x86 definition 128 bits) and later on we are packing the results of two lanes I am not sure about the need for last permute instruction.
If you look at the packuswb, the packing is done per 128 bit alternating from src1 and src2.
i.e.
0-127 bits from src1 -> 0-63 bits in dst
0-127 bits from src2 -> 64-127 bits in dst
128-255 bits from src1 -> 128-191 bits in dst
128-255 bits from src2 -> 192-255 bits in dst
The src1 and src2 are mish-mashed in dst and need to be put in their proper place by permpd.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/68
More information about the panama-dev
mailing list