[foreign] RFR 8210757: Add binder support for direct native invocation strategy
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Sat Sep 15 20:31:31 UTC 2018
On 15/09/18 01:04, Jorn Vernee wrote:
> Maurizio Cimadamore schreef op 2018-09-14 20:04:
>> Hi,
>> as mentioned in [1], this patch adds binder support for the so called
>> 'direct' invocation scheme, which allows for greater native invocation
>> downcall/upcall performances by means of specialized adapters. The
>> core idea, also described in [1], is to define adapters of the kind:
>>
>> invokeNative_V_DDDDD
>> invokeNative_V_JDDDD
>> invokeNative_V_JJDDD
>> invokeNative_V_JJJDD
>> invokeNative_V_JJJJD
>> invokeNative_V_JJJJJ
>>
>> Where long arguments come before double arguments (and do this for
>> each arity e.g. <=5).
>>
>> If all arguments are passed in register, then this reordering doesn't
>> affect behavior, and greatly limits the number of permutations to be
>> supported/generated.
>
> On windows the story seems to be more difficult then I initially
> thought. On SysV, if you have a C function like this:
>
> void f(long l, double d);
>
> `l` will be passed in the first integer register, and `d` will be
> passed in the first float/vector register. But on windows, `d` will be
> passed in the **second** float/vector register, and if there was a
> another integer argument it would be passed in the third integer
> register [1]. This becomes worse with varargs, which requires floats
> to be passed in both the integer and float/vector registers.
>
> So I don't think reordering parameters will work on windows, since the
> parameter index dictates which register it uses. However, you should
> be able to still use the downcall-with-shuffling strategy (though I
> don't have that working yet for mixed argument classes).
Ouch - yes, it does seem that Windows allocates registers in a
positional way ; that said, it doesn't change the picture much - in a
way I was fully aware (and probably should have made more explicit) that
this fast path is an opportunistic optimization that we can take
depending on the ABI.
The real solution, as mentioned, going forward, is to generate
specialized JNI stubs on the fly - so that's gonna be the next stop (and
the one after that is to teach C2/Graal about such stubs so that they
could be optimized). But the main point here is that, w/o sprucing some
kind of specialization in the entry points generated by the VM it will
be impossible to achieve sensible performances.
Back to Windows, it seems like its fastcall strategy allows it to use up
to 4 registers, period. So, that's not too terrible in terms of number
of manually generated entry points, if we wanted to go there (not for
today :-)).
>
>> The downcall part (java to native) is relative straightforward: the
>> directNativeInvoker.cpp file defines a bunch of native entry points,
>> one per shape, which cast the input address to a function pointer of
>> the desired shape, and then call it:
>>
>> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong addr,
>> jlong arg0, jdouble arg1) {
>> return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
>> }
>
> As an optimization here, I think you should make the function address
> the last argument, since that prevents having to shuffle the other
> arguments between registers before calling the function [2]
I'm not sure I 100% get this (and I can't seem to be able to open your
link).
Once you are inside this adapter, you have already the two first integer
registers taken by the JNI arguments (*env and _unused, in this
example). Which seems to suggest that, regardless of where we put
'addr', the compiled code will still need to move { arg0, arg1 } in the
first two integer registers?
Maurizio
>
>> * we need to setup a framework in which new invocation strategies can
>> be plugged in - note that we now have essentially 4 cases:
>>
>> { NativeInvoker, UpcallHandler } x { Universal, Direct }
>>
>> When the code wants e.g. a NativeInvoker, it asks for one to the
>> NativeInvoker::of factory (UpcallHandler work in a similar way); this
>> factory will attempt to go down the fast path - if an error occurs
>> when computing the fast path, the call will fallback to the universal
>> (slow) path.
>
> This sounds like a great idea!
>
> Jorn
>
> [1] :
> https://docs.microsoft.com/en-us/cpp/build/parameter-passing?view=vs-2017#example-of-argument-passing-3---mixed-ints-and-floats
> [2] : https://godbolt.org/z/JjPJca
More information about the panama-dev
mailing list