Observations from a simple JMH benchmark
Paul Sandoz
paul.sandoz at oracle.com
Fri Feb 16 22:16:34 UTC 2018
> On Feb 16, 2018, at 2:05 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>
> I looked into that a bit more.
>
Nice analysis. Hopefully we can bash this into shape at some point, perhaps by making checkFromIndexSize an intrinsic so other code can benefit?
> Here's the updated patch:
> http://cr.openjdk.java.net/~vlivanov/panama/vector.oob/webrev.01
>
+1
Paul.
> Range check:
> Objects.checkIndex(ix, length - (vlen - 1))
>
>
> (1) for (int i = 0; i < SIZE; i += species.length()) {
> ... a3[i:i+vlen] = a1[i:i+vlen] + a2[i:i+vlen] ...
> }
>
> static final int SIZE = ...;
>
> Result: No range checks inside the loop.
>
>
> (2) for (int i = 0; i < a3.length; i += species.length()) {
> ... a3[i:i+vlen] = a1[i:i+vlen] + a2[i:i+vlen] ...
> }
>
> Result: as you observed, some checks are left (but 2 instead of 3).
>
> Let's try something even simpler:
>
> (3) // Single vector load
> for (int i = 0; i < a.length; i += species.length()) {
> IntVector<Shapes.S256Bit> av = species.fromArray(a, i);
> sum += av.addAll();
> }
>
> Result: no range check in the loop.
>
> (4) // Load & store are from/into the same array.
> for (int i = 0; i < a.length; i += species.length()) {
> IntVector<Shapes.S256Bit> av = species.fromArray(a, i);
> av.add(av).intoArray(a, i);
> }
>
> Result: 1 range check in the loop (right before the store).
>
> One thing I noticed with the loop shape is that it is sensitive to array size: if the size is not a multiple of vector length, then last access is out-of-bounds.
>
> (5) for (int i = 0;
> i < a3.length - (species.length() - 1);
> i += species.length()) {
> ... a3[i:i+vlen] = a1[i:i+vlen] + a2[i:i+vlen] ...
> }
>
> Result: no range checks in the loop.
>
>
> So, if there are no OOB accesses possible, C2 successfully hoists all range checks out of the loop.
>
> What's interesting is even in presence of OOB accesses C2 is able to hoist 1 range check. Probably, there's some room for improvement (or a bug lurking here ;-)).
>
> Best regards,
> Vladimir Ivanov
>
> On 2/16/18 8:02 PM, Paul Sandoz wrote:
>>> On Feb 16, 2018, at 4:38 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>>
>>>> Better! still some artifacts:
>>> ...
>>>> I think you can drop the first index check for case 2. IIUC you are assuming constraints vlen > 0 and length >= 0, so for:
>>>> Objects.checkIndex(ix, length - (vlen - 1));
>>>> the check will fail if length - (vlen - 1) < 0.
>>>
>>> From correctness perspective, yes. But I was mostly concerned about C2 and having 2 checks which mimicing ordinary range checks looked like a safer bet. Preconditions.checkIndex() is intrinsified as CmpI(length,0) + CmpU(index,length) and strength-reduction to single CmpU works only if (length >= 0) which is the case for array length. So, the difference is CmpU + CmpU vs CmpI + CmpU.
>>>
>>> I hoped that C2 could statically prove that (length - (vlen - 1)) >= 0 after the first check and optimize it accordingly. But something goes wrong.
>>>
>>> Briefly looking into generated IR, it seems compiler does prove (length - (vlen - 1)) >= 0, but (length - constant) shape confuses some other transformation, so hoisting isn't performed.
>>>
>> Ok, i don’t recall this kind of thing happening with say the access of ints from a ByteBuffer, implying this might be something vector specific (i would need to go back and double check buffer accesses, but i am sure i would have spotted this when doing the VarHandles work).
>> Paul.
More information about the panama-dev
mailing list