[vectorIntrinsics] RFR: 8265312: Unsigned comparison operators

Sandhya Viswanathan sviswanathan at openjdk.java.net
Wed Apr 21 16:30:43 UTC 2021


On Wed, 21 Apr 2021 06:25:42 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Add unsigned comparison operators to VectorOperators and add intrinsic support on x64.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 2143:
> 
>> 2141:     vpmovzxbw(vtmp2, src2, vlen_enc);
>> 2142:     vpcmpCCW(dst, vtmp1, vtmp2, comparison, Assembler::W, vlen_enc, scratch);
>> 2143:     vpand(dst, dst, ExternalAddress(StubRoutines::x86::vector_short_to_byte_mask()), vlen_enc, scratch);
> 
> Hi @sviswa7 , zero extension for unsigned comparison for signed numbers looks fine, post extension comparison result will be all 1s or all 0s in corresponding lanes, by doing a signed saturation packing (vpacksswb) can we not avoid extra vector AND.

Good point, using packsswb should work here. I will give it a try and update the patch accordingly.

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 2183:
> 
>> 2181:     vpand(vtmp3, vtmp3, ExternalAddress(StubRoutines::x86::vector_short_to_byte_mask()), vlen_enc, scratch);
>> 2182:     vpackuswb(dst, dst, vtmp3, vlen_enc);
>> 2183:     vpermpd(dst, dst, 0xd8, vlen_enc);
> 
> since comparison is performed at lane level (x86 definition 128 bits) and later on we are packing the results of two lanes I am not sure about the need for last permute instruction.

If you look at the packuswb, the packing is done per 128 bit alternating from src1 and src2.
i.e. 
0-127 bits from src1 -> 0-63 bits in dst
0-127 bits from src2 -> 64-127 bits in dst
128-255 bits from src1 -> 128-191 bits in dst
128-255 bits from src2 -> 192-255 bits in dst
The src1 and src2 are mish-mashed in dst and need to be put in their proper place by permpd.

-------------

PR: https://git.openjdk.java.net/panama-vector/pull/68


More information about the panama-dev mailing list