On generalizing vector intrinsics

Wed Nov 15 18:47:17 UTC 2017

Thanks for the feedback, Razvan!

> Thanks so much for experimenting with reducing number of intrinsics needed. I am glad you are looking into this. I have a couple of questions:
> - Is there a way to force methods like "binaryOp" to be inlined? In general, we will want all intrinsics to end up in same method in order to reduce chances of boxing.

What do you mean by "inlined" here?

Wrappers in concrete vector classes are marked w/ @ForceInline, so if 
the compiler can inline the method, it will and see the intrinsic:
   @ 22   Int128Vector::add (6 bytes)   force inline by annotation
     @ 2    Int128Vector::add (24 bytes)   force inline by annotation
       @ 17   VectorIntrinsics::binary (15 bytes)   (intrinsic)

Or do you care about the case when intrinsification fails? In that case, 
what Paul proposes looks promising.

> - For approach 2, is it possible to recover the concrete instance of object (which represents Vector class) from the VM side? Will it appear as a "ConP" from PoV of compiler?

Yes, the whole idea relies on compiler ability to see the values as 
constants. The same applies to class parameters - C2 is able to extract 
ciKlass* from a class constant passed as an argument:

     const TypeInstPtr* vector_klass = 
gvn().type(argument(1))->is_instptr();
     ciKlass* vbox_klass = 
vector_klass->const_oop()->as_instance()->java_lang_Class_klass();

> - How about instead of looking at it from Java side, looking at it from JVM side? I can think of two possibilities right now:

Well, I'd say the experiment is focused on JVM :-) Mostly about 
simplifying JVM part and moving as much code out of the JVM into Java.

>      1) Create some new macros so that common methods like add can receive entries in intrinsics table for each type without additional verbosity required for each.

That's another option. But it leads to less code in Java & more code in C2.

>      2) Improve the intrinsics dispatcher so that once it sees a method like add used in VectorAPI, it disregards actual class used and tries to dispatch to intrinsic. In implementation, we recover shape and type from profile (which we do anyway).

Profile is tricky: compiler is free to use it, but can't trust it :-)

Generalized intrinsics I'm playing with don't rely on profile 
(directly): all relevant info is passed as constants (so compiler can 
"see" & use it during compilation).

But since intrinsic usages are put into concrete vector classes, the JIT 
sees those usages when inlining succeeds and inlining heavily relies on 
the profile data.

Best regards,
Vladimir Ivanov
> 
> Thanks,
> --Razvan
> 
> -----Original Message-----
> From: panama-dev [mailto:panama-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Ivanov
> Sent: Tuesday, November 14, 2017 2:57 PM
> To: 'panama-dev at openjdk.java.net' <panama-dev at openjdk.java.net>
> Subject: On generalizing vector intrinsics
> 
> Hi,
> 
> FYI I did a quick experiment with more generic vector intrinsics and wanted to share the first results. The motivation was to explore possible reduction in the number of intrinsics needed to support Vector API in the JVM.
> 
> The first candidates were binary arithmetic operations on vectors:
> 
>   
> http://cr.openjdk.java.net/~vlivanov/panama/vector.generalized_intrinsics/
> 
> Overall, it looks promising.
> 
> It can be easily extended to masked variants (e.g., by adding additional argument for the mask and pass null for all-ones mask) and operation variants (e.g., saturated add).
> 
> Dispatching is simple on JVM side. One question is how to represent vector shape (element type + vector size). There are different options:
> 
>     (1) pass parameters explicitly (prototyped):
> 
>       Vector binaryOp(int opr, int elem, int size, Vector v1, Vector v2)
> 
>       (Int256Vector) VectorIntrinsics.binaryOp(VECTOR_OP_ADD,
>                                                VECTOR_ELEM_INT,
>                                                256,
>                                                this,
>                                                (Int256Vector)o);
> 
> 
>     (2) pass concrete vector class and extract vector shape info from it
> 
>       Vector binaryOp(int opr, Class vector_box, Vector v1, Vector v2)
> 
> 
>     (3) pass shape implicitly: extract the shape from the class of the first vector (always exact class):
> 
> final class Int256Vector extends IntVector<Shapes.S256Bit> { ...
>       @Override
>       @ForceInline
>       public Int256Vector add(Vector<Integer,Shapes.S256Bit> v) {
>           return (Int256Vector) VectorIntrinsics.binaryOp(VECTOR_OP_ADD,
> 							VECTOR_ELEM_INT,
> 							256,
> 							this,
> 						       (Int256Vector)v);
>       }
> 
> 
> Some considerations:
> 
> #1: explicit and trivial to extract info on C2 side. The downside is that it requires additional non-trivial steps to find exact vector box class (see get_exact_klass_for_vector_box() and
> ctx->find_klass(vector_klass_name) there): the fact that vector classes
> aren't part of java.base, but jdk.incubator.vector complicates class lookup a bit.
> 
> #2: vector shape is clearly documented in the code, but requires some additional steps (or hard-coded info in the JVM) to extract different pieces.
> 
> #3: relies on implicit convention that JIT knows exact type for "this":
> that's the case if the usage is in concrete vector class and first vector argument is "this".
> 
> Personall, I'm in favor of #2, but other options look attractive as well.
> 
> Best regards,
> Vladimir Ivanov
>