On generalizing vector intrinsics

Wed Nov 15 19:06:43 UTC 2017

Thanks for the feedback, Paul!

> That is pleasantly compact on the VM side.
> 
> On the Java side to avoid the centralising the implementation down into a nest of massive switch statements we may need a another parameter referencing the method that is the pure Java implementation to be called if the intrinsic is not supported or has yet to kick in. So the java implementation of VectorIntrinsics.binaryOp would be a simple re-invoker, and perhaps C2 could even optimize that to elide the redirection?

Nice idea! I've implemented it in the updated version and it looks nice:

http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics.01/

   @ForceInline
   public Double128Vector add(Vector<Double,Shapes.S128Bit> o) {
       return (Double128Vector) VectorIntrinsics.binary(
               VECTOR_OP_ADD, Double128Vector.class, VECTOR_ELEM_DOUBLE, 
128,
               this, (Double128Vector)o,
               (v1, v2) -> ((Double128Vector)v1).bOp(v2, (i, a, b) -> (a 
+ b)));
   }

Indeed, C2 can eliminate the indirection, but the main benefit would be 
to optimize the implementation for the particular vector shape and use 
it (Int256Vector.add vs IntVector.bOp()).

Best regards,
Vladimir Ivanov

> 
>     Vector binaryOp(int opr, int elem, int size, Vector v1, Vector v2, VectorBinaryOp javaOp)
> 
>    (Int256Vector) VectorIntrinsics.binaryOp(VECTOR_OP_ADD,
>                                              VECTOR_ELEM_INT,
>                                              256,
>                                              this,
>                                              (Int256Vector)o,
>                                              this::_add);
> 

>> On 14 Nov 2017, at 14:57, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
>>
>> Hi,
>>
>> FYI I did a quick experiment with more generic vector intrinsics and wanted to share the first results. The motivation was to explore possible reduction in the number of intrinsics needed to support Vector API in the JVM.
>>
>> The first candidates were binary arithmetic operations on vectors:
>>
>> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics/
>>
>> Overall, it looks promising.
>>
>> It can be easily extended to masked variants (e.g., by adding additional argument for the mask and pass null for all-ones mask) and operation variants (e.g., saturated add).
>>
>> Dispatching is simple on JVM side. One question is how to represent vector shape (element type + vector size). There are different options:
>>
>>   (1) pass parameters explicitly (prototyped):
>>
>>     Vector binaryOp(int opr, int elem, int size, Vector v1, Vector v2)
>>
>>     (Int256Vector) VectorIntrinsics.binaryOp(VECTOR_OP_ADD,
>>                                              VECTOR_ELEM_INT,
>>                                              256,
>>                                              this,
>>                                              (Int256Vector)o);
>>
>>
>>   (2) pass concrete vector class and extract vector shape info from it
>>
>>     Vector binaryOp(int opr, Class vector_box, Vector v1, Vector v2)
>>
>>
>>   (3) pass shape implicitly: extract the shape from the class of the first vector (always exact class):
>>
>> final class Int256Vector extends IntVector<Shapes.S256Bit> {
>> ...
>>     @Override
>>     @ForceInline
>>     public Int256Vector add(Vector<Integer,Shapes.S256Bit> v) {
>>         return (Int256Vector) VectorIntrinsics.binaryOp(VECTOR_OP_ADD,
>> 							VECTOR_ELEM_INT,
>> 							256,
>> 							this,
>> 						       (Int256Vector)v);
>>     }
>>
>>
>> Some considerations:
>>
>> #1: explicit and trivial to extract info on C2 side. The downside is that it requires additional non-trivial steps to find exact vector box class (see get_exact_klass_for_vector_box() and ctx->find_klass(vector_klass_name) there): the fact that vector classes aren't part of java.base, but jdk.incubator.vector complicates class lookup a bit.
>>
>> #2: vector shape is clearly documented in the code, but requires some additional steps (or hard-coded info in the JVM) to extract different pieces.
>>
>> #3: relies on implicit convention that JIT knows exact type for "this": that's the case if the usage is in concrete vector class and first vector argument is "this".
>>
>> Personall, I'm in favor of #2, but other options look attractive as well.
>>
>> Best regards,
>> Vladimir Ivanov
>