Update on generalized intrinsics experiment
Viswanathan, Sandhya
sandhya.viswanathan at intel.com
Fri Dec 22 23:16:15 UTC 2017
Hi Vladimir,
I agree your work looks very clean and promising. I tried a couple of test cases and found that the performance and the generated code quality doesn’t yet match what is checked in currently. You allude to that in your email below regarding this and need for re-materialization support. I think we should wait to get the code quality up to par before merging in (Vector API is all about performance).
The couple of test cases that I tried are:
1)
BLAS Sxpy Kernel:
static <S extends Vector.Shape<Vector<?, ?>>> void VecSaxpy(FloatVector.FloatSpecies<S> fspec, float[] a, int a_offset, float[] b, int b_offset, float alpha) {
FloatVector<S> alphaVec = fspec.broadcast(alpha);
for (int i = 0; (i + a_offset + fspec.length()) < a.length && (i + b_offset + fspec.length()) < b.length; i += fspec.length()) {
FloatVector<S> bv = fspec.fromArray(b, i + b_offset);
FloatVector<S> av = fspec.fromArray(a, i + a_offset);
bv.add(av.mul(alphaVec)).intoArray(b, i + b_offset);
}
}
Performance with panama/dev/vectorIntrinsics repo: 14198.011 iter/sec
Performance with panama/dev/vectorIntrinsics + Patch: 2869.790 iter/sec
The C2 JITTED code with the repo build: BLAS.txt
The C2 JITTED code with the patch: BLAS_new.txt
2)
test/jdk/jdk/incubator/vector/AddTest.java:
The C2 JITTED code with the repo build: AddTest.txt
The C2 JITTED code with the patch: AddTest_new.txt
Best Regards,
Sandhya
-----Original Message-----
From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Ivanov
Sent: Friday, December 22, 2017 2:01 PM
To: Paul Sandoz <paul.sandoz at oracle.com>
Cc: panama-dev at openjdk.java.net
Subject: Re: Update on generalized intrinsics experiment
> This is looking very promising. It’s pleasing to see code gen approach held up without modification.
>
> From a code bloat perspective do you think it worth pushing op impls, where amenable, up to the abstract *Vector<S> types?
It's doable, but moving them up in hierarchy plays against JIT-compiler.
When the code resides in leaf classes, JIT enjoys working with constants
and exact types. If the code becomes shared, those vector shape-specific
parameters should be fetched first which becomes a hurdle for the compiler.
It's not a big deal when everything is inlined, but if it's not we end
up with multiple calls instead of 1.
And the same applies to default implementation. It's much harder for the
compiler to specialize the code for particular vector shape when crucial
pieces are abstracted away.
Best regards,
Vladimir Ivanov
>> I tried to keep intrinsics separate and accompany each with relevant code (as deleted):
>>
>> (0) Vector.length()
>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/length/
>>
>> Replaced length() with pure Java implementation. Static final fields are treated as constants, so not much benefit in keeping it as an intrinsic.
>>
>>
>> (1) VI.binaryOp()
>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/binary/
>>
>> Implemented add(), sub(), mul(), div(), and(), or(), xor().
>>
>>
>> (2) VI.broadcastCoerced()
>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/broadcast
>>
>> Chose coerced (to long) version (though boxed version works fine as well [1]). Implemented zero() & broadcast().
>>
>>
>> (3) VI.load()/store()
>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/memory/
>>
>> Implemented fromArray()/intoArray().
>>
>>
>> (4) VI.reductionCoerced()
>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/reduction/
>>
>> Result value is coerced to long bits. Implemented addAll(), mulAll().
>>
>> Test results look fine, but from code quality perspective, absence of rematerialization support at safepoints keep many allocations alive.
>>
>>
>> Also, started experimenting with masks:
>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.03/mask/
>>
>> It's not complete yet, but nice thing is that existing intrinsics can be extended to masks as well.
>>
>> Let me know what do you think about that. Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.broadcast/
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: AddTest.txt
URL: <http://mail.openjdk.java.net/pipermail/panama-dev/attachments/20171222/eda24468/AddTest-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: AddTest_new.txt
URL: <http://mail.openjdk.java.net/pipermail/panama-dev/attachments/20171222/eda24468/AddTest_new-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: BLAS.txt
URL: <http://mail.openjdk.java.net/pipermail/panama-dev/attachments/20171222/eda24468/BLAS-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: BLAS_new.txt
URL: <http://mail.openjdk.java.net/pipermail/panama-dev/attachments/20171222/eda24468/BLAS_new-0001.txt>
More information about the panama-dev
mailing list