[vector] RFR (L): Generalized intrinsics for vector operations (first batch)
Paul Sandoz
paul.sandoz at oracle.com
Fri Feb 9 22:42:07 UTC 2018
> On Feb 9, 2018, at 2:24 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>
> Thanks, Paul.
>
> On 2/10/18 1:14 AM, Paul Sandoz wrote:
>> Looks good.
>> 139 @Override
>> 140 @ForceInline
>> 141 public $vectortype$ add(Vector<$Boxtype$,Shapes.$shape$> v) {
>> 142 Objects.requireNonNull(v);
>> 143 return ($vectortype$) VectorIntrinsics.binaryOp(
>> 144 VECTOR_OP_ADD, $vectortype$.class, $type$.class, LENGTH,
>> 145 this, ($vectortype$)v,
>> 146 (v1, v2) -> (($vectortype$)v1).bOp(v2, (i, a, b) -> ($type$)(a + b)));
>> 147 }
>> Given you perform the cast at line 145, so both input vectors are of the same concrete type, do you require it on v1 in the lambda?
>
> Good point. It can be even further simplified to:
>
> public Int256Vector add(Vector<Integer,Shapes.S256Bit> v) {
> Objects.requireNonNull(v);
> return VectorIntrinsics.binaryOp(
> VECTOR_OP_ADD, Int256Vector.class, int.class, LENGTH,
> this, v,
> (v1, v2) -> v1.bOp(v2, (i, a, b) -> (int)(a + b)));
> }
>
Right, i thought you might want the explicit cast for reasons you say, so i am ok with it.
Paul.
> But I'd prefer to keep the cast at line 145 explicit to stress there's a type check in place on the argument.
>
> I'll use the following shape:
>
> 139 @Override
> 140 @ForceInline
> 141 public $vectortype$ add(Vector<$Boxtype$,Shapes.$shape$> v) {
> 142 Objects.requireNonNull(v);
> 143 return VectorIntrinsics.binaryOp(
> 144 VECTOR_OP_ADD, $vectortype$.class, $type$.class, LENGTH,
> 145 this, ($vectortype$)v,
> 146 (v1, v2) -> v1.bOp(v2, (i, a, b) -> ($type$)(a + b)));
> 147 }
>
> Are you ok with it?
>
> Best regards,
> Vladimir Ivanov
>
>> Paul.
>>> On Feb 9, 2018, at 1:38 PM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>>
>>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics/webrev.06/
>>>
>>> Here's the first batch of rewritten intrinsics for vector operations.
>>>
>>> Main motivation for new implementation is:
>>> * reduce number of intrinsic needed;
>>> * improve intrinsification robustness;
>>> * minimize changes needed in shared C2 code.
>>>
>>> The idea of generalized intrinsics is to parameterize them with additional information (passed as constant arguments), so JIT-compiler has enough information to dispatch to proper implementation during intrinsification.
>>>
>>> For example, binary vector operation can be represented as:
>>>
>>> // (V,V) -> V
>>> @HotSpotIntrinsicCandidate
>>> static <V> V binaryOp(int oprId, Class<V> vectorClass,
>>> Class<?> elementType, int vlen,
>>> V v1, V v2,
>>> BiFunction<V,V,V> defaultImpl) {
>>> return defaultImpl.apply(v1, v2);
>>> }
>>>
>>> and used as:
>>>
>>> // (Int256Vector,Int256Vector) -> Int256Vector
>>>
>>> public Int256Vector add(Vector<Integer,Shapes.S256Bit> v) {
>>> return (Int256Vector) VectorIntrinsics.binaryOp(
>>> VECTOR_OP_ADD,
>>> Int256Vector.class, int.class, 8,
>>> this, (Int256Vector)v,
>>> (v1, v2) -> ((Int256Vector)v1).bOp(v2,
>>> (i, a, b) -> (int)(a+b)));
>>> }
>>>
>>> where:
>>> oprId encodes actual operation (VECTOR_OP_ADD);
>>>
>>> vectorClass, elementType, vlen describe concrete vector class (Int256Vector);
>>>
>>> v1, v2 are actual arguments of vector binary operations;
>>>
>>> defaultImpl - scalar implementation which is used when intrinsification fails.
>>>
>>>
>>> Generalized intrinsics are declared on jdk.incubator.vector.VectorIntrinsics and the patch contains 6 of them:
>>> * broadcastCoerced
>>> * reductionCoerced
>>> * binaryOp
>>> * load/store
>>> * test
>>>
>>> The following vector operations were ported to new mechanism:
>>> * broadcast: zero, broadcast, trueMask, falseMask
>>> * reduction: addAll, mulAll
>>> * binaryOp: add, sub, mul, div, and, or, xor
>>> * load/store: intoArray, fromArray
>>> * test: anyTrue, allTrue
>>>
>>> There's new flag added to turn new intrinsics on/off:
>>>
>>> + product(bool, UseVectorApiGeneralizedIntrinsics, true,
>>>
>>> Previous discussions [1] [2].
>>>
>>> The patch adds alternative implementations, but doesn't remove existing ones, since there are some code dependencies on them. Once the dependencies are broken, the code will go away.
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] http://mail.openjdk.java.net/pipermail/panama-dev/2017-November/000748.html
>>>
>>> [2] http://mail.openjdk.java.net/pipermail/panama-dev/2017-December/000884.html
More information about the panama-dev
mailing list