Vector API: How to write code template specialization ?

Mon Apr 13 17:26:15 UTC 2020

Hi all,
as a kind of study to see how to use the vector API to implement a simple runtime [1] for J (that weird descendant of APL :)

It works quite well until you try to share code, let say i have a code to do a reduction on an array,
i can write one version for +, one version for *, etc, or i can write a method that takes a VectorOperations as parameter and the JIT will be smart enough to figure that if i call the method with the constant VectorOperations.ADD, i want the JIT to specialize the method for ADD.

So in my runtime, have a method foldValueADD that calls foldValueTemplate(ADD).

An it fails spectacularly because the JIT think that the template function foldValueTemplate is too big to be inlined.

fr.umlv.vector.CellBenchMark::add_cell (14 bytes)
   @ 7   fr.umlv.jruntime.Cell$Dyad::fold (12 bytes)   inline (hot)
     @ 8   fr.umlv.jruntime.Cell$Fold::<init> (56 bytes)   inline (hot)
       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
       @ 40   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
   @ 10   fr.umlv.jruntime.Cell::apply (132 bytes)   inline (hot)
     @ 1   fr.umlv.jruntime.Cell$Fold::foldVerbs (20 bytes)   inline (hot)
     @ 58   fr.umlv.jruntime.Cell$Rank$Vector::fold (6 bytes)   inline (hot)
       @ 2   fr.umlv.jruntime.Cell$Rank$Vector::foldValue (31 bytes)   inline (hot)
         @ 8   fr.umlv.jruntime.Cell$Backend::foldValue (193 bytes)   inline (hot)
           @ 21   java.lang.Enum::ordinal (5 bytes)   accessor
           @ 86   fr.umlv.jruntime.Cell$VectorizedBackend::foldValueADD (14 bytes)   inline (hot)
             @ 2   java.lang.invoke.Invokers$Holder::linkToTargetMethod (8 bytes)   force inline by annotation
               @ 4   java.lang.invoke.LambdaForm$MH/0x0000000800067840::invoke (8 bytes)   force inline by annotation
             @ 10   fr.umlv.jruntime.Cell$VectorizedBackend::foldValueTemplate (110 bytes)   already compiled into a big method
         @ 17   fr.umlv.jruntime.Cell$Rank::vector (9 bytes)   inline (hot)
           @ 5   fr.umlv.jruntime.Cell$Rank$Vector::<init> (10 bytes)   inline (hot)
             @ 1   java.lang.Record::<init> (5 bytes)   inline (hot)
               @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
         @ 27   fr.umlv.jruntime.Cell::<init> (15 bytes)   inline (hot)
           @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)

Given that i'm not developing my code inside the JDK, i can not have access to @ForceInlining.

I think the JIT heuristics need to be tweaked so a method that takes a constants of VectorOperations as parameter is always inlined.
Otherwise, there is no point to expose all the constants in VectorOperations given that even a simple reduction takes enough bytecodes to be considered as a big method for the JIT. 

Or maybe there is another solution ?

regards,
Rémi

[1] https://github.com/forax/panama-vector/blob/master/fr.umlv.jruntime/src/main/java/fr/umlv/jruntime/Cell.java#L787