Update on generalized intrinsics experiment
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Sat Dec 23 00:30:23 UTC 2017
Thanks a lot for giving it a try, Sandhya!
I fully agree with you that until box elimination support is on par with
what we see right now, generalized intrinsics shouldn't replace existing
implementation.
My current understanding is that vector rematerialization on safepoints
should get us there and implementing it is high on my TODO list.
One note though: I noticed bounds checks on memory accesses are missing
in intrinsics and added them as part of the refactoring [1]. From what I
saw in the code, they aren't well optimized and not eliminated from loop
bodies (as ordinary array bounds checks are). So, it can manifest as a
small regression, though it's not: those checks are required for
correctness.
What makes them different from ordinary array bounds check is that
vector accesses touch ranges instead of single elements (0 <= i < length
vs 0 <= [i, i + vlen) <= length). But I believe there's a way forward to
reduce them to ordinary bounds check [2]. So, once more urgent problems
are fixed, we can look into that one.
Best regards,
Vladimir Ivanov
[1]
http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/memory/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-VectorBits.java.template.udiff.html
+ public void intoArray($type$[] a, int ix) {
...
+ Objects.checkFromIndexSize(ix, LENGTH, a.length);
+ public $vectortype$ fromArray($type$[] a, int ix) {
...
+ Objects.checkFromIndexSize(ix, LENGTH, a.length);
[2] 0 <= i
(i + vlen) <= length
==>
(a) 0 <= i <= (length - vlen)
(b) (vlen <= length)
a - upper bound is loop invariant
b - loop invariant
On 12/23/17 2:16 AM, Viswanathan, Sandhya wrote:
> Hi Vladimir,
>
> I agree your work looks very clean and promising. I tried a couple of test cases and found that the performance and the generated code quality doesn’t yet match what is checked in currently. You allude to that in your email below regarding this and need for re-materialization support. I think we should wait to get the code quality up to par before merging in (Vector API is all about performance).
>
> The couple of test cases that I tried are:
> 1)
> BLAS Sxpy Kernel:
> static <S extends Vector.Shape<Vector<?, ?>>> void VecSaxpy(FloatVector.FloatSpecies<S> fspec, float[] a, int a_offset, float[] b, int b_offset, float alpha) {
> FloatVector<S> alphaVec = fspec.broadcast(alpha);
> for (int i = 0; (i + a_offset + fspec.length()) < a.length && (i + b_offset + fspec.length()) < b.length; i += fspec.length()) {
> FloatVector<S> bv = fspec.fromArray(b, i + b_offset);
> FloatVector<S> av = fspec.fromArray(a, i + a_offset);
> bv.add(av.mul(alphaVec)).intoArray(b, i + b_offset);
> }
> }
> Performance with panama/dev/vectorIntrinsics repo: 14198.011 iter/sec
> Performance with panama/dev/vectorIntrinsics + Patch: 2869.790 iter/sec
> The C2 JITTED code with the repo build: BLAS.txt
> The C2 JITTED code with the patch: BLAS_new.txt
>
> 2)
> test/jdk/jdk/incubator/vector/AddTest.java:
> The C2 JITTED code with the repo build: AddTest.txt
> The C2 JITTED code with the patch: AddTest_new.txt
>
> Best Regards,
> Sandhya
>
>
> -----Original Message-----
> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Ivanov
> Sent: Friday, December 22, 2017 2:01 PM
> To: Paul Sandoz <paul.sandoz at oracle.com>
> Cc: panama-dev at openjdk.java.net
> Subject: Re: Update on generalized intrinsics experiment
>
>> This is looking very promising. It’s pleasing to see code gen approach held up without modification.
>>
>> From a code bloat perspective do you think it worth pushing op impls, where amenable, up to the abstract *Vector<S> types?
>
> It's doable, but moving them up in hierarchy plays against JIT-compiler.
> When the code resides in leaf classes, JIT enjoys working with constants
> and exact types. If the code becomes shared, those vector shape-specific
> parameters should be fetched first which becomes a hurdle for the compiler.
>
> It's not a big deal when everything is inlined, but if it's not we end
> up with multiple calls instead of 1.
>
> And the same applies to default implementation. It's much harder for the
> compiler to specialize the code for particular vector shape when crucial
> pieces are abstracted away.
>
> Best regards,
> Vladimir Ivanov
>
>>> I tried to keep intrinsics separate and accompany each with relevant code (as deleted):
>>>
>>> (0) Vector.length()
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/length/
>>>
>>> Replaced length() with pure Java implementation. Static final fields are treated as constants, so not much benefit in keeping it as an intrinsic.
>>>
>>>
>>> (1) VI.binaryOp()
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/binary/
>>>
>>> Implemented add(), sub(), mul(), div(), and(), or(), xor().
>>>
>>>
>>> (2) VI.broadcastCoerced()
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/broadcast
>>>
>>> Chose coerced (to long) version (though boxed version works fine as well [1]). Implemented zero() & broadcast().
>>>
>>>
>>> (3) VI.load()/store()
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/memory/
>>>
>>> Implemented fromArray()/intoArray().
>>>
>>>
>>> (4) VI.reductionCoerced()
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/reduction/
>>>
>>> Result value is coerced to long bits. Implemented addAll(), mulAll().
>>>
>>> Test results look fine, but from code quality perspective, absence of rematerialization support at safepoints keep many allocations alive.
>>>
>>>
>>> Also, started experimenting with masks:
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/mask/
>>>
>>> It's not complete yet, but nice thing is that existing intrinsics can be extended to masks as well.
>>>
>>> Let me know what do you think about that. Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.broadcast/
>>
More information about the panama-dev
mailing list