[foreign] RFR 8210757: Add binder support for direct native invocation strategy
Jorn Vernee
jbvernee at xs4all.nl
Fri Sep 28 12:18:28 UTC 2018
Oh sorry, I just remembered/noticed that you put that in the original
email as well. Just got the idea by looking at the code.
Jorn
Maurizio Cimadamore schreef op 2018-09-28 14:12:
> And no, we can't do what you suggest (at least not in a
> straightforward fashion) because @Stable is in a non-exported package
> of java.base and the callback is spinned in the user-land. But it
> should be no issue.
>
> Maurizio
>
>
> On 28/09/18 13:11, Maurizio Cimadamore wrote:
>> No need for that, Vm treats all final fields on VM anon classes as
>> @Stable (or so I've been told) :-)
>>
>> Maurizio
>>
>>
>> On 28/09/18 13:06, Jorn Vernee wrote:
>>> Mostly out of curiosity; can you make the generated MethodHandle
>>> field in CallbackImplGenerator @Stable as well?
>>>
>>> private void generateMethodHandleField(BinderClassWriter cw) {
>>> cw.visitField(ACC_PRIVATE | ACC_FINAL, MH_FIELD_NAME,
>>> Type.getDescriptor(MethodHandle.class), null, null)
>>> .visitAnnotation(Type.getDescriptor(Stable.class), true);
>>> }
>>>
>>> Jorn
>>>
>>> Maurizio Cimadamore schreef op 2018-09-28 13:19:
>>>> Webrev:
>>>>
>>>> http://cr.openjdk.java.net/~mcimadamore/panama/8210757_v2/
>>>>
>>>> Maurizio
>>>>
>>>>
>>>> On 28/09/18 12:19, Maurizio Cimadamore wrote:
>>>>> This is an updated version of the direct invocation scheme support.
>>>>> Very close to the last one, but there are some minor
>>>>> refactorings/improvements:
>>>>>
>>>>> 1) Added a @Stable annotation in DirectNativeInvoker's MH field
>>>>> 2) box/unbox routine used by the UniversalXYZ strategies have been
>>>>> moved from NativeInvoker to UniversalNativeInvoker
>>>>> 3) I revamped the logic which detects whether fastpath is
>>>>> applicable - now we create the calling sequence first, and we use
>>>>> that to check whether we can fast path it. Some internal benchmark
>>>>> have shown that with a large number of symbols, we were doing a lot
>>>>> of work because we were trying the fastpath always and then, in
>>>>> case of exception fallback to slow path; in such cases we would
>>>>> create calling sequence twice. This new technique might also be
>>>>> more friendly w.r.t. Windows and other ABIs.
>>>>>
>>>>> I'd really like to move ahead with this (as this RFR has been out
>>>>> for quite a while now) - if there's no other comments I'll go
>>>>> ahead.
>>>>>
>>>>> Maurizio
>>>>>
>>>>>
>>>>> On 14/09/18 19:04, Maurizio Cimadamore wrote:
>>>>>> Hi,
>>>>>> as mentioned in [1], this patch adds binder support for the so
>>>>>> called 'direct' invocation scheme, which allows for greater native
>>>>>> invocation downcall/upcall performances by means of specialized
>>>>>> adapters. The core idea, also described in [1], is to define
>>>>>> adapters of the kind:
>>>>>>
>>>>>> invokeNative_V_DDDDD
>>>>>> invokeNative_V_JDDDD
>>>>>> invokeNative_V_JJDDD
>>>>>> invokeNative_V_JJJDD
>>>>>> invokeNative_V_JJJJD
>>>>>> invokeNative_V_JJJJJ
>>>>>>
>>>>>> Where long arguments come before double arguments (and do this for
>>>>>> each arity e.g. <=5).
>>>>>>
>>>>>> If all arguments are passed in register, then this reordering
>>>>>> doesn't affect behavior, and greatly limits the number of
>>>>>> permutations to be supported/generated.
>>>>>>
>>>>>> The downcall part (java to native) is relative straightforward:
>>>>>> the directNativeInvoker.cpp file defines a bunch of native entry
>>>>>> points, one per shape, which cast the input address to a function
>>>>>> pointer of the desired shape, and then call it:
>>>>>>
>>>>>> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong
>>>>>> addr, jlong arg0, jdouble arg1) {
>>>>>> return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
>>>>>> }
>>>>>>
>>>>>> The upcall business is a little trickier: first, if we are only to
>>>>>> optimize upcalls where argument passing happens in registers, then
>>>>>> it's crucial to note that by the time we get into the assembly
>>>>>> stub, all the registers will have been populated by the native
>>>>>> code to contain the right arguments in the right places. So we can
>>>>>> avoid all the shuffling in the assembly adapter and simply jump
>>>>>> onto a C function that looks like this:
>>>>>>
>>>>>> long specialized_upcall_helper_J(long l0, long l1, long l2, long
>>>>>> l3,
>>>>>> double d0, double d1, double
>>>>>> d2, double d3,
>>>>>> unsigned int mask, jobject
>>>>>> rec) { ... }
>>>>>>
>>>>>> Note here that the first 8 arguments are just longs and doubles,
>>>>>> and those will be expected to be in registers, according to the
>>>>>> System V ABI. (In windows, the situation will be a bit different
>>>>>> as less integer registers are available, so this will need some
>>>>>> work there).
>>>>>>
>>>>>> So, to recap, the assembly upcall stub simply 'append' the
>>>>>> receiver object and a 'signature mask' in the last two available C
>>>>>> registers and then jump onto the helper function. The helper
>>>>>> function will find all the desired arguments in the right places -
>>>>>> there will be, in the general case, some unused arguments, but
>>>>>> that's fine, after all it didn't cost anything to us to load them
>>>>>> in the first place!
>>>>>>
>>>>>> Note that we have three helper variants, one for each return type
>>>>>> { long, double, void }. This is required as we need the C helper
>>>>>> to return a value of the right type which will generate the right
>>>>>> assembly sequence to store the result in the right register
>>>>>> (either integer or MMX).
>>>>>>
>>>>>> So, with three helpers we can support all the shapes with up to 8
>>>>>> arguments. On the Java side we have, of course, to define a
>>>>>> specialized entry point for each shape.
>>>>>>
>>>>>> All the magic for adapting method handle to and from the
>>>>>> specialized adapters happen in the DirectSignatureShuffler class;
>>>>>> this class is responsible for adapting each argument e.g. from
>>>>>> Java to native value, and then reordering the adapted method
>>>>>> handle to match the order in which arguments are expected by the
>>>>>> adapter (e.g. move all longs in front). The challenge was in
>>>>>> having DirectSignatureShuffle to be fully symmetric - e.g. I did
>>>>>> not want to have different code paths for upcalls and downcalls,
>>>>>> so the code tries quite hard to be parametric in the shuffling
>>>>>> direction (java->native or native->java) - which means that
>>>>>> adapters will be applied in one way or in the inverse way
>>>>>> depending on the shuffling direction (and as to whether we are
>>>>>> adapting an argument or a return). Since method handle filters are
>>>>>> composable, it all works out quite beautifully.
>>>>>>
>>>>>> Note that the resulting, adapted MH is stored in a @Stable field
>>>>>> to tell the JIT to optimize the heck out of it (as if it were a
>>>>>> static constant).
>>>>>>
>>>>>> This patch contains several other changes - which I discuss
>>>>>> briefly below:
>>>>>>
>>>>>> * we need to setup a framework in which new invocation strategies
>>>>>> can be plugged in - note that we now have essentially 4 cases:
>>>>>>
>>>>>> { NativeInvoker, UpcallHandler } x { Universal, Direct }
>>>>>>
>>>>>> When the code wants e.g. a NativeInvoker, it asks for one to the
>>>>>> NativeInvoker::of factory (UpcallHandler work in a similar way);
>>>>>> this factory will attempt to go down the fast path - if an error
>>>>>> occurs when computing the fast path, the call will fallback to the
>>>>>> universal (slow) path.
>>>>>>
>>>>>> Most of the changes you see in the Java code are associated to
>>>>>> this refactoring - e.g. all clients of NativeInvoker/UpcallHandler
>>>>>> should now go through the factory
>>>>>>
>>>>>> * CallbackImplGenerator had a major issue since the new factory
>>>>>> for NativeInvoker wants to bind an address eagerly (this is
>>>>>> required e.g. to be forward compatible with linkToNative backend);
>>>>>> which means that at construction time we have to get the address
>>>>>> of the callback, call the NativeInvoker factory and then stash the
>>>>>> target method handle into a field of the anon callback class. Vlad
>>>>>> tells me that fields of anon classes are always 'trusted' by the
>>>>>> JIT, which means they should be treated as '@Stable' (note that I
>>>>>> can't put a @Stable annotation there, since this code will be
>>>>>> spinned in user-land).
>>>>>>
>>>>>> * There are a bunch of properties that can be set to either force
>>>>>> slow path or force 'direct' path; in the latter case, if an error
>>>>>> occurs when instantiating the direct wrapper, an exception is
>>>>>> thrown. This mode is very useful for testing, and I indeed have
>>>>>> tried to run all our tests with this flag enabled, to see how many
>>>>>> places could not be optimized.
>>>>>>
>>>>>> * I've also reorganized all the native code in hotspot/prims so
>>>>>> that we have a separate file for each scheme (and so that native
>>>>>> Java methods could be added where they really belong). This should
>>>>>> also help in the long run as it should make adding/removing a
>>>>>> given scheme easier.
>>>>>>
>>>>>> * I've also added a small test which tries to pass structs of
>>>>>> different sizes, but I will also work on a more complex test which
>>>>>> will stress-test all invocation modes in a more complete fashion.
>>>>>> With respect to testing, I've also done a fastdebug build and ran
>>>>>> all tests with that (as fastdebug catches way many more hotspot
>>>>>> assertion than the product version); everything passed.
>>>>>>
>>>>>> Webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mcimadamore/panama/8210757/
>>>>>>
>>>>>> I'd like to thank Vladimir Ivanov for the prompt support whenever
>>>>>> I got stuck down the macro assembler rabbit hole :-)
>>>>>>
>>>>>> Cheers
>>>>>> Maurizio
>>>>>>
>>>>>> [1] -
>>>>>> http://mail.openjdk.java.net/pipermail/panama-dev/2018-September/002652.html
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>
More information about the panama-dev
mailing list