[foreign] RFR 8210757: Add binder support for direct native invocation strategy

Fri Sep 28 12:12:57 UTC 2018

And no, we can't do what you suggest (at least not in a straightforward 
fashion) because @Stable is in a non-exported package of java.base and 
the callback is spinned in the user-land. But it should be no issue.

Maurizio

On 28/09/18 13:11, Maurizio Cimadamore wrote:
> No need for that, Vm treats all final fields on VM anon classes as 
> @Stable (or so I've been told) :-)
>
> Maurizio
>
>
> On 28/09/18 13:06, Jorn Vernee wrote:
>> Mostly out of curiosity; can you make the generated MethodHandle 
>> field in CallbackImplGenerator @Stable as well?
>>
>> private void generateMethodHandleField(BinderClassWriter cw) {
>>     cw.visitField(ACC_PRIVATE | ACC_FINAL, MH_FIELD_NAME, 
>> Type.getDescriptor(MethodHandle.class), null, null)
>>         .visitAnnotation(Type.getDescriptor(Stable.class), true);
>> }
>>
>> Jorn
>>
>> Maurizio Cimadamore schreef op 2018-09-28 13:19:
>>> Webrev:
>>>
>>> http://cr.openjdk.java.net/~mcimadamore/panama/8210757_v2/
>>>
>>> Maurizio
>>>
>>>
>>> On 28/09/18 12:19, Maurizio Cimadamore wrote:
>>>> This is an updated version of the direct invocation scheme support. 
>>>> Very close to the last one, but there are some minor 
>>>> refactorings/improvements:
>>>>
>>>> 1) Added a @Stable annotation in DirectNativeInvoker's MH field
>>>> 2) box/unbox routine used by the UniversalXYZ strategies have been 
>>>> moved from NativeInvoker to UniversalNativeInvoker
>>>> 3) I revamped the logic which detects whether fastpath is 
>>>> applicable - now we create the calling sequence first, and we use 
>>>> that to check whether we can fast path it. Some internal benchmark 
>>>> have shown that with a large number of symbols, we were doing a lot 
>>>> of work because we were trying the fastpath always and then, in 
>>>> case of exception fallback to slow path; in such cases we would 
>>>> create calling sequence twice. This new technique might also be 
>>>> more friendly w.r.t. Windows and other ABIs.
>>>>
>>>> I'd really like to move ahead with this (as this RFR has been out 
>>>> for quite a while now) - if there's no other comments I'll go ahead.
>>>>
>>>> Maurizio
>>>>
>>>>
>>>> On 14/09/18 19:04, Maurizio Cimadamore wrote:
>>>>> Hi,
>>>>> as mentioned in [1], this patch adds binder support for the so 
>>>>> called 'direct' invocation scheme, which allows for greater native 
>>>>> invocation downcall/upcall performances by means of specialized 
>>>>> adapters. The core idea, also described in [1], is to define 
>>>>> adapters of the kind:
>>>>>
>>>>> invokeNative_V_DDDDD
>>>>> invokeNative_V_JDDDD
>>>>> invokeNative_V_JJDDD
>>>>> invokeNative_V_JJJDD
>>>>> invokeNative_V_JJJJD
>>>>> invokeNative_V_JJJJJ
>>>>>
>>>>> Where long arguments come before double arguments (and do this for 
>>>>> each arity e.g. <=5).
>>>>>
>>>>> If all arguments are passed in register, then this reordering 
>>>>> doesn't affect behavior, and greatly limits the number of 
>>>>> permutations to be supported/generated.
>>>>>
>>>>> The downcall part (java to native) is relative straightforward: 
>>>>> the directNativeInvoker.cpp file defines a bunch of native entry 
>>>>> points, one per shape, which cast the input address to a function 
>>>>> pointer of the desired shape, and then call it:
>>>>>
>>>>> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong 
>>>>> addr, jlong arg0, jdouble arg1) {
>>>>>     return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
>>>>> }
>>>>>
>>>>> The upcall business is a little trickier: first, if we are only to 
>>>>> optimize upcalls where argument passing happens in registers, then 
>>>>> it's crucial to note that by the time we get into the assembly 
>>>>> stub, all the registers will have been populated by the native 
>>>>> code to contain the right arguments in the right places. So we can 
>>>>> avoid all the shuffling in the assembly adapter and simply jump 
>>>>> onto a C function that looks like this:
>>>>>
>>>>> long specialized_upcall_helper_J(long l0, long l1, long l2, long l3,
>>>>>                                       double d0, double d1, double 
>>>>> d2, double d3,
>>>>>                                        unsigned int mask, jobject 
>>>>> rec) { ... }
>>>>>
>>>>> Note here that the first 8 arguments are just longs and doubles, 
>>>>> and those will be expected to be in registers, according to the 
>>>>> System V ABI. (In windows, the situation will be a bit different 
>>>>> as less integer registers are available, so this will need some 
>>>>> work there).
>>>>>
>>>>> So, to recap, the assembly upcall stub simply 'append' the 
>>>>> receiver object and a 'signature mask' in the last two available C 
>>>>> registers and then jump onto the helper function. The helper 
>>>>> function will find all the desired arguments in the right places - 
>>>>> there will be, in the general case, some unused arguments, but 
>>>>> that's fine, after all it didn't cost anything to us to load them 
>>>>> in the first place!
>>>>>
>>>>> Note that we have three helper variants, one for each return type 
>>>>> { long, double, void }. This is required as we need the C helper 
>>>>> to return a value of the right type which will generate the right 
>>>>> assembly sequence to store the result in the right register 
>>>>> (either integer or MMX).
>>>>>
>>>>> So, with three helpers we can support all the shapes with up to 8 
>>>>> arguments. On the Java side we have, of course, to define a 
>>>>> specialized entry point for each shape.
>>>>>
>>>>> All the magic for adapting method handle to and from the 
>>>>> specialized adapters happen in the DirectSignatureShuffler class; 
>>>>> this class is responsible for adapting each argument e.g. from 
>>>>> Java to native value, and then reordering the adapted method 
>>>>> handle to match the order in which arguments are expected by the 
>>>>> adapter (e.g. move all longs in front). The challenge was in 
>>>>> having DirectSignatureShuffle to be fully symmetric - e.g. I did 
>>>>> not want to have different code paths for upcalls and downcalls, 
>>>>> so the code tries quite hard to be parametric in the shuffling 
>>>>> direction (java->native or native->java) - which means that 
>>>>> adapters will be applied in one way or in the inverse way 
>>>>> depending on the shuffling direction (and as to whether we are 
>>>>> adapting an argument or a return). Since method handle filters are 
>>>>> composable, it all works out quite beautifully.
>>>>>
>>>>> Note that the resulting, adapted MH is stored in a @Stable field 
>>>>> to tell the JIT to optimize the heck out of it (as if it were a 
>>>>> static constant).
>>>>>
>>>>> This patch contains several other changes - which I discuss 
>>>>> briefly below:
>>>>>
>>>>> * we need to setup a framework in which new invocation strategies 
>>>>> can be plugged in - note that we now have essentially 4 cases:
>>>>>
>>>>> { NativeInvoker, UpcallHandler } x { Universal, Direct }
>>>>>
>>>>> When the code wants e.g. a NativeInvoker, it asks for one to the 
>>>>> NativeInvoker::of factory (UpcallHandler work in a similar way); 
>>>>> this factory will attempt to go down the fast path - if an error 
>>>>> occurs when computing the fast path, the call will fallback to the 
>>>>> universal (slow) path.
>>>>>
>>>>> Most of the changes you see in the Java code are associated to 
>>>>> this refactoring - e.g. all clients of NativeInvoker/UpcallHandler 
>>>>> should now go through the factory
>>>>>
>>>>> * CallbackImplGenerator had a major issue since the new factory 
>>>>> for NativeInvoker wants to bind an address eagerly (this is 
>>>>> required e.g. to be forward compatible with linkToNative backend); 
>>>>> which means that at construction time we have to get the address 
>>>>> of the callback, call the NativeInvoker factory and then stash the 
>>>>> target method handle into a field of the anon callback class. Vlad 
>>>>> tells me that fields of anon classes are always 'trusted' by the 
>>>>> JIT, which means they should be treated as '@Stable' (note that I 
>>>>> can't put a @Stable annotation there, since this code will be 
>>>>> spinned in user-land).
>>>>>
>>>>> * There are a bunch of properties that can be set to either force 
>>>>> slow path or force 'direct' path; in the latter case, if an error 
>>>>> occurs when instantiating the direct wrapper, an exception is 
>>>>> thrown. This mode is very useful for testing, and I indeed have 
>>>>> tried to run all our tests with this flag enabled, to see how many 
>>>>> places could not be optimized.
>>>>>
>>>>> * I've also reorganized all the native code in hotspot/prims so 
>>>>> that we have a separate file for each scheme (and so that native 
>>>>> Java methods could be added where they really belong). This should 
>>>>> also help in the long run as it should make adding/removing a 
>>>>> given scheme easier.
>>>>>
>>>>> * I've also added a small test which tries to pass structs of 
>>>>> different sizes, but I will also work on a more complex test which 
>>>>> will stress-test all invocation modes in a more complete fashion. 
>>>>> With respect to testing, I've also done a fastdebug build and ran 
>>>>> all tests with that (as fastdebug catches way many more hotspot 
>>>>> assertion than the product version); everything passed.
>>>>>
>>>>> Webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~mcimadamore/panama/8210757/
>>>>>
>>>>> I'd like to thank Vladimir Ivanov for the prompt support whenever 
>>>>> I got stuck down the macro assembler rabbit hole :-)
>>>>>
>>>>> Cheers
>>>>> Maurizio
>>>>>
>>>>> [1] - 
>>>>> http://mail.openjdk.java.net/pipermail/panama-dev/2018-September/002652.html
>>>>>
>>>>>
>>>>>
>>>>
>