[foreign] some JMH benchmarks

Fri Sep 14 17:39:02 UTC 2018

> So, in principle, we could define a bunch of native entry points in
> the VM, one per shape, which take a bunch of long and doubles and call
> an underlying function with those arguments. For instance, let's
> consider the case of a native function which is modelled in Java as:
> 
> int m(Pointer<Foo>, double)
> 
> To call this native function we have to first turn the Java arguments
> into a (long, double) pair. Then we need to call a native adapter that
> looks like the following:
> 
> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong addr,
> jlong arg0, jdouble arg1) {
>     return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
> }
> 
> And this will take care of calling the native function and returning
> the value back. This is, admittedly, a very simple solution; of course
> there are limitations: we have to define a bunch of specialized native
> entry point (and Java entry points, for callbacks). But here we can
> play a trick: most of moderns ABI pass arguments in registers; for
> instance System V ABI [5] uses up to 6 (!!) integer registers and 7
> (!!) MMXr registers for FP values - this gives us a total of 13
> registers available for argument passing. Which covers quite a lot of
> cases. Now, if we have a call where _all_ arguments are passed in
> registers, then the order in which these arguments are declared in the
> adapter doesn't matter! That is, since FP-values will always be passed
> in different register from integral values, we can just define entry
> points which look like these:
> 
> invokeNative_V_DDDDD
> invokeNative_V_JDDDD
> invokeNative_V_JJDDD
> invokeNative_V_JJJDD
> invokeNative_V_JJJJD
> invokeNative_V_JJJJJ
> 
> That is, for a given arity (5 in this case), we can just put all long
> arguments in front, and the double arguments after that. That is, we
> don't need to generate all possible permutations of J/D in all
> positions - as the adapter will always do the same thing (read: load
> from same registers) for all equivalent combinations. This keeps the
> number of entry points in check -  and it also poses some challenges
> to the Java logic in charge of marshalling/unmarshalling, as there's
> an extra permutation step involved (although that is not something
> super-hard to address).

I'm wondering if the 5 native end points for an arity of 5 are enough. 
Don't you also need 5 for when the function returns a long and 5 more 
for when the function returns a double?

I have a suggestion to bypass having to write out all the permutations 
though. What if, on the Java side, whenever there is a method that has a 
shape that can be optimized in this way (which is ABI dependent), spin 
and load a class which defines a single static native method with the 
needed signature, and annotate it. Then, in NativeLookup::lookup, detect 
this annotation, and instead of trying to look up the symbol in a loaded 
library generate a forwarding stub and link the native method to that 
instead. Then you can take a MethodHandle to the native method in the 
anonymous class and use that in the backing implementation.

I'm not sure if it's all that easy though, what do you think?

Jorn