Observations from a simple JMH benchmark

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Feb 16 22:05:41 UTC 2018


I looked into that a bit more.

Here's the updated patch:
   http://cr.openjdk.java.net/~vlivanov/panama/vector.oob/webrev.01

Range check:
   Objects.checkIndex(ix, length - (vlen - 1))


(1) for (int i = 0; i < SIZE; i += species.length()) {
        ... a3[i:i+vlen] = a1[i:i+vlen] + a2[i:i+vlen] ...
     }

     static final int SIZE = ...;

     Result: No range checks inside the loop.


(2) for (int i = 0; i < a3.length; i += species.length()) {
         ... a3[i:i+vlen] = a1[i:i+vlen] + a2[i:i+vlen] ...
     }

     Result: as you observed, some checks are left (but 2 instead of 3).

Let's try something even simpler:

(3) // Single vector load
      for (int i = 0; i < a.length; i += species.length()) {
         IntVector<Shapes.S256Bit> av = species.fromArray(a, i);
         sum += av.addAll();
      }

      Result: no range check in the loop.

(4) // Load & store are from/into the same array.
     for (int i = 0; i < a.length; i += species.length()) {
         IntVector<Shapes.S256Bit> av = species.fromArray(a, i);
         av.add(av).intoArray(a, i);
     }

     Result: 1 range check in the loop (right before the store).

One thing I noticed with the loop shape is that it is sensitive to array 
size: if the size is not a multiple of vector length, then last access 
is out-of-bounds.

(5) for (int i = 0;
          i < a3.length - (species.length() - 1);
          i += species.length()) {
         ... a3[i:i+vlen] = a1[i:i+vlen] + a2[i:i+vlen] ...
     }

     Result: no range checks in the loop.


So, if there are no OOB accesses possible, C2 successfully hoists all 
range checks out of the loop.

What's interesting is even in presence of OOB accesses C2 is able to 
hoist 1 range check. Probably, there's some room for improvement (or a 
bug lurking here ;-)).

Best regards,
Vladimir Ivanov

On 2/16/18 8:02 PM, Paul Sandoz wrote:
> 
> 
>> On Feb 16, 2018, at 4:38 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>>> Better! still some artifacts:
>> ...
>>> I think you can drop the first index check for case 2. IIUC you are assuming constraints vlen > 0 and length >= 0, so for:
>>>    Objects.checkIndex(ix, length - (vlen - 1));
>>> the check will fail if length - (vlen - 1) < 0.
>>
>>  From correctness perspective, yes. But I was mostly concerned about C2 and having 2 checks which mimicing ordinary range checks looked like a safer bet. Preconditions.checkIndex() is intrinsified as CmpI(length,0) + CmpU(index,length) and strength-reduction to single CmpU works only if (length >= 0) which is the case for array length. So, the difference is CmpU + CmpU vs CmpI + CmpU.
>>
>> I hoped that C2 could statically prove that (length - (vlen - 1)) >= 0 after the first check and optimize it accordingly. But something goes wrong.
>>
>> Briefly looking into generated IR, it seems compiler does prove (length - (vlen - 1)) >= 0, but (length - constant) shape confuses some other transformation, so hoisting isn't performed.
>>
> 
> Ok, i don’t recall this kind of thing happening with say the access of ints from a ByteBuffer, implying this might be something vector specific (i would need to go back and double check buffer accesses, but i am sure i would have spotted this when doing the VarHandles work).
> 
> Paul.
> 


More information about the panama-dev mailing list