Vector API: How to write code template specialization ?
    Viswanathan, Sandhya 
    sandhya.viswanathan at intel.com
       
    Wed Apr 15 16:48:56 UTC 2020
    
    
  
Hi Remi,
You might already know the following:
There is a compile command to force inline, wonder if you could use that for your experiments in the meantime.
-XX:CompileCommand=inline,class_path.method
You could also give these in a file and specify that file on command line instead using:
-XX:CompileCommandFile=<file>
Where the <file> contains lines like:
inline class1_path.method
inline class2_path.*
More info in src/hotspot/share/compiler/compilerOracle.cpp.
Hope this helps.
Thanks a lot for your feedback.
Best Regards,
Sandhya
-----Original Message-----
From: panama-dev <panama-dev-bounces at openjdk.java.net> On Behalf Of Remi Forax
Sent: Monday, April 13, 2020 10:26 AM
To: panama-dev at openjdk.java.net' <panama-dev at openjdk.java.net>
Subject: Vector API: How to write code template specialization ?
Hi all,
as a kind of study to see how to use the vector API to implement a simple runtime [1] for J (that weird descendant of APL :)
It works quite well until you try to share code, let say i have a code to do a reduction on an array, i can write one version for +, one version for *, etc, or i can write a method that takes a VectorOperations as parameter and the JIT will be smart enough to figure that if i call the method with the constant VectorOperations.ADD, i want the JIT to specialize the method for ADD.
So in my runtime, have a method foldValueADD that calls foldValueTemplate(ADD).
An it fails spectacularly because the JIT think that the template function foldValueTemplate is too big to be inlined.
fr.umlv.vector.CellBenchMark::add_cell (14 bytes)
   @ 7   fr.umlv.jruntime.Cell$Dyad::fold (12 bytes)   inline (hot)
     @ 8   fr.umlv.jruntime.Cell$Fold::<init> (56 bytes)   inline (hot)
       @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
       @ 40   java.util.Objects::requireNonNull (14 bytes)   inline (hot)
   @ 10   fr.umlv.jruntime.Cell::apply (132 bytes)   inline (hot)
     @ 1   fr.umlv.jruntime.Cell$Fold::foldVerbs (20 bytes)   inline (hot)
     @ 58   fr.umlv.jruntime.Cell$Rank$Vector::fold (6 bytes)   inline (hot)
       @ 2   fr.umlv.jruntime.Cell$Rank$Vector::foldValue (31 bytes)   inline (hot)
         @ 8   fr.umlv.jruntime.Cell$Backend::foldValue (193 bytes)   inline (hot)
           @ 21   java.lang.Enum::ordinal (5 bytes)   accessor
           @ 86   fr.umlv.jruntime.Cell$VectorizedBackend::foldValueADD (14 bytes)   inline (hot)
             @ 2   java.lang.invoke.Invokers$Holder::linkToTargetMethod (8 bytes)   force inline by annotation
               @ 4   java.lang.invoke.LambdaForm$MH/0x0000000800067840::invoke (8 bytes)   force inline by annotation
             @ 10   fr.umlv.jruntime.Cell$VectorizedBackend::foldValueTemplate (110 bytes)   already compiled into a big method
         @ 17   fr.umlv.jruntime.Cell$Rank::vector (9 bytes)   inline (hot)
           @ 5   fr.umlv.jruntime.Cell$Rank$Vector::<init> (10 bytes)   inline (hot)
             @ 1   java.lang.Record::<init> (5 bytes)   inline (hot)
               @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
         @ 27   fr.umlv.jruntime.Cell::<init> (15 bytes)   inline (hot)
           @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
Given that i'm not developing my code inside the JDK, i can not have access to @ForceInlining.
I think the JIT heuristics need to be tweaked so a method that takes a constants of VectorOperations as parameter is always inlined.
Otherwise, there is no point to expose all the constants in VectorOperations given that even a simple reduction takes enough bytecodes to be considered as a big method for the JIT. 
Or maybe there is another solution ?
regards,
Rémi
[1] https://github.com/forax/panama-vector/blob/master/fr.umlv.jruntime/src/main/java/fr/umlv/jruntime/Cell.java#L787
    
    
More information about the panama-dev
mailing list