Observations from a simple JMH benchmark

Paul Sandoz paul.sandoz at oracle.com
Fri Feb 16 22:16:34 UTC 2018



> On Feb 16, 2018, at 2:05 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
> 
> I looked into that a bit more.
> 

Nice analysis. Hopefully we can bash this into shape at some point, perhaps by making checkFromIndexSize an intrinsic so other code can benefit?


> Here's the updated patch:
>  http://cr.openjdk.java.net/~vlivanov/panama/vector.oob/webrev.01
> 

+1

Paul.

> Range check:
>  Objects.checkIndex(ix, length - (vlen - 1))
> 
> 
> (1) for (int i = 0; i < SIZE; i += species.length()) {
>       ... a3[i:i+vlen] = a1[i:i+vlen] + a2[i:i+vlen] ...
>    }
> 
>    static final int SIZE = ...;
> 
>    Result: No range checks inside the loop.
> 
> 
> (2) for (int i = 0; i < a3.length; i += species.length()) {
>        ... a3[i:i+vlen] = a1[i:i+vlen] + a2[i:i+vlen] ...
>    }
> 
>    Result: as you observed, some checks are left (but 2 instead of 3).
> 
> Let's try something even simpler:
> 
> (3) // Single vector load
>     for (int i = 0; i < a.length; i += species.length()) {
>        IntVector<Shapes.S256Bit> av = species.fromArray(a, i);
>        sum += av.addAll();
>     }
> 
>     Result: no range check in the loop.
> 
> (4) // Load & store are from/into the same array.
>    for (int i = 0; i < a.length; i += species.length()) {
>        IntVector<Shapes.S256Bit> av = species.fromArray(a, i);
>        av.add(av).intoArray(a, i);
>    }
> 
>    Result: 1 range check in the loop (right before the store).
> 
> One thing I noticed with the loop shape is that it is sensitive to array size: if the size is not a multiple of vector length, then last access is out-of-bounds.
> 
> (5) for (int i = 0;
>         i < a3.length - (species.length() - 1);
>         i += species.length()) {
>        ... a3[i:i+vlen] = a1[i:i+vlen] + a2[i:i+vlen] ...
>    }
> 
>    Result: no range checks in the loop.
> 
> 
> So, if there are no OOB accesses possible, C2 successfully hoists all range checks out of the loop.
> 
> What's interesting is even in presence of OOB accesses C2 is able to hoist 1 range check. Probably, there's some room for improvement (or a bug lurking here ;-)).
> 
> Best regards,
> Vladimir Ivanov
> 
> On 2/16/18 8:02 PM, Paul Sandoz wrote:
>>> On Feb 16, 2018, at 4:38 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>> 
>>>> Better! still some artifacts:
>>> ...
>>>> I think you can drop the first index check for case 2. IIUC you are assuming constraints vlen > 0 and length >= 0, so for:
>>>>   Objects.checkIndex(ix, length - (vlen - 1));
>>>> the check will fail if length - (vlen - 1) < 0.
>>> 
>>> From correctness perspective, yes. But I was mostly concerned about C2 and having 2 checks which mimicing ordinary range checks looked like a safer bet. Preconditions.checkIndex() is intrinsified as CmpI(length,0) + CmpU(index,length) and strength-reduction to single CmpU works only if (length >= 0) which is the case for array length. So, the difference is CmpU + CmpU vs CmpI + CmpU.
>>> 
>>> I hoped that C2 could statically prove that (length - (vlen - 1)) >= 0 after the first check and optimize it accordingly. But something goes wrong.
>>> 
>>> Briefly looking into generated IR, it seems compiler does prove (length - (vlen - 1)) >= 0, but (length - constant) shape confuses some other transformation, so hoisting isn't performed.
>>> 
>> Ok, i don’t recall this kind of thing happening with say the access of ints from a ByteBuffer, implying this might be something vector specific (i would need to go back and double check buffer accesses, but i am sure i would have spotted this when doing the VarHandles work).
>> Paul.



More information about the panama-dev mailing list