RFR: 8284932: [Vector API] Incorrect implementation of LSHR operator for negative byte/short elements
Jie Fu
jiefu at openjdk.java.net
Mon May 16 12:21:57 UTC 2022
On Wed, 27 Apr 2022 09:13:34 GMT, Jie Fu <jiefu at openjdk.org> wrote:
>> Hi all,
>>
>> According to the Vector API doc, the `LSHR` operator computes `a>>>(n&(ESIZE*8-1))`.
>> However, current implementation is incorrect for negative bytes/shorts.
>>
>> The background is that one of our customers try to vectorize `urshift` with `urshiftVector` like the following.
>>
>> 13 public static void urshift(byte[] src, byte[] dst) {
>> 14 for (int i = 0; i < src.length; i++) {
>> 15 dst[i] = (byte)(src[i] >>> 3);
>> 16 }
>> 17 }
>> 18
>> 19 public static void urshiftVector(byte[] src, byte[] dst) {
>> 20 int i = 0;
>> 21 for (; i < spec.loopBound(src.length); i +=spec.length()) {
>> 22 var va = ByteVector.fromArray(spec, src, i);
>> 23 var vb = va.lanewise(VectorOperators.LSHR, 3);
>> 24 vb.intoArray(dst, i);
>> 25 }
>> 26
>> 27 for (; i < src.length; i++) {
>> 28 dst[i] = (byte)(src[i] >>> 3);
>> 29 }
>> 30 }
>>
>>
>> Unfortunately and to our surprise, code at line28 computes different results with code at line23.
>> It took quite a long time to figure out this bug.
>>
>> The root cause is that current implemenation of Vector API can't compute the unsigned right shift results as what is done for scalar `>>>` for negative byte/short elements.
>> Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for all bytes, which is unable to compute the vectorized `>>>` for negative bytes.
>> So this seems unreasonable and unfriendly to Java developers.
>> It would be better to fix it.
>>
>> The key idea to support unsigned right shift of negative bytes/shorts is just to replace the unsigned right shift operation with the signed right shift operation.
>> This logic is:
>> - For byte elements, unsigned right shift is equal to signed right shift if the shift_cnt <= 24.
>> - For short elements, unsigned right shift is equal to signed right shift if the shift_cnt <= 16.
>> - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes and shift_cnt <= 15 for shorts.
>>
>> I just learned this idea from https://github.com/openjdk/jdk/pull/7979 .
>> And many thanks to @fg1417 .
>>
>>
>> Thanks.
>> Best regards,
>> Jie
>>
>> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935
>
>> > > According to the Vector API doc, the LSHR operator computes a>>>(n&(ESIZE*8-1))
>>
>> Documentation is correct if viewed strictly in context of subword vector lane, JVM internally promotes/sign extends subword type scalar variables into int type, but vectors are loaded from continuous memory holding subwords, it will not be correct for developer to imagine that individual subword type lanes will be upcasted into int lanes before being operated upon.
>>
>> Thus both java implementation and compiler handling looks correct.
>
> Thanks @jatin-bhateja for taking a look at this.
> After the discussion, I think it's fine to keep the current implementation of LSHR.
> So we're now fixing the misleading doc here: https://github.com/openjdk/jdk/pull/8291 .
>
> And I think it would be better to add one more operator for `>>>`.
> Thanks.
> @DamonFool should this PR and the JBS issue be closed?
Okay.
Let's close it.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8276
More information about the hotspot-compiler-dev
mailing list