RFR: 8284932: [Vector API] Incorrect implementation of LSHR operator for negative byte/short elements
Jie Fu
jiefu at openjdk.java.net
Mon Apr 18 03:24:30 UTC 2022
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu <jiefu at openjdk.org> wrote:
> Hi all,
>
> According to the Vector API doc, the `LSHR` operator computes `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
>
> The background is that one of our customers try to vectorize `urshift` with `urshiftVector` like the following.
>
> 13 public static void urshift(byte[] src, byte[] dst) {
> 14 for (int i = 0; i < src.length; i++) {
> 15 dst[i] = (byte)(src[i] >>> 3);
> 16 }
> 17 }
> 18
> 19 public static void urshiftVector(byte[] src, byte[] dst) {
> 20 int i = 0;
> 21 for (; i < spec.loopBound(src.length); i +=spec.length()) {
> 22 var va = ByteVector.fromArray(spec, src, i);
> 23 var vb = va.lanewise(VectorOperators.LSHR, 3);
> 24 vb.intoArray(dst, i);
> 25 }
> 26
> 27 for (; i < src.length; i++) {
> 28 dst[i] = (byte)(src[i] >>> 3);
> 29 }
> 30 }
>
>
> Unfortunately and to our surprise, code at line28 computes different results with code at line23.
> It took quite a long time to figure out this bug.
>
> The root cause is that current implemenation of Vector API can't compute the unsigned right shift results as what is done for scalar `>>>` for negative byte/short elements.
> Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for all bytes, which is unable to compute the vectorized `>>>` for negative bytes.
> So this seems unreasonable and unfriendly to Java developers.
> It would be better to fix it.
>
> The key idea to support unsigned right shift of negative bytes/shorts is just to replace the unsigned right shift operation with the signed right shift operation.
> This logic is:
> - For byte elements, unsigned right shift is equal to signed right shift if the shift_cnt <= 24.
> - For short elements, unsigned right shift is equal to signed right shift if the shift_cnt <= 16.
> - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes and shift_cnt <= 15 for shorts.
>
> I just learned this idea from https://github.com/openjdk/jdk/pull/7979 .
> And many thanks to @fg1417 .
>
>
> Thanks.
> Best regards,
> Jie
>
> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935
According to the vector api doc, `LSHR` seems to be designed to vectorize the scalar `>>>` with masked `shift_cnt`.
Since for most cases, if we use vector api to rewrite the scalar code, we don't know if all the inputs are positive only.
So it would be better to follow the scalar `>>>` behavior for any supported masked `shift_cnt`.
Thanks.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8276
More information about the core-libs-dev
mailing list