[foreign] RFR 8210757: Add binder support for direct native invocation strategy

Fri Sep 28 12:06:17 UTC 2018

Mostly out of curiosity; can you make the generated MethodHandle field 
in CallbackImplGenerator @Stable as well?

private void generateMethodHandleField(BinderClassWriter cw) {
     cw.visitField(ACC_PRIVATE | ACC_FINAL, MH_FIELD_NAME, 
Type.getDescriptor(MethodHandle.class), null, null)
         .visitAnnotation(Type.getDescriptor(Stable.class), true);
}

Jorn

Maurizio Cimadamore schreef op 2018-09-28 13:19:
> Webrev:
> 
> http://cr.openjdk.java.net/~mcimadamore/panama/8210757_v2/
> 
> Maurizio
> 
> 
> On 28/09/18 12:19, Maurizio Cimadamore wrote:
>> This is an updated version of the direct invocation scheme support. 
>> Very close to the last one, but there are some minor 
>> refactorings/improvements:
>> 
>> 1) Added a @Stable annotation in DirectNativeInvoker's MH field
>> 2) box/unbox routine used by the UniversalXYZ strategies have been 
>> moved from NativeInvoker to UniversalNativeInvoker
>> 3) I revamped the logic which detects whether fastpath is applicable - 
>> now we create the calling sequence first, and we use that to check 
>> whether we can fast path it. Some internal benchmark have shown that 
>> with a large number of symbols, we were doing a lot of work because we 
>> were trying the fastpath always and then, in case of exception 
>> fallback to slow path; in such cases we would create calling sequence 
>> twice. This new technique might also be more friendly w.r.t. Windows 
>> and other ABIs.
>> 
>> I'd really like to move ahead with this (as this RFR has been out for 
>> quite a while now) - if there's no other comments I'll go ahead.
>> 
>> Maurizio
>> 
>> 
>> On 14/09/18 19:04, Maurizio Cimadamore wrote:
>>> Hi,
>>> as mentioned in [1], this patch adds binder support for the so called 
>>> 'direct' invocation scheme, which allows for greater native 
>>> invocation downcall/upcall performances by means of specialized 
>>> adapters. The core idea, also described in [1], is to define adapters 
>>> of the kind:
>>> 
>>> invokeNative_V_DDDDD
>>> invokeNative_V_JDDDD
>>> invokeNative_V_JJDDD
>>> invokeNative_V_JJJDD
>>> invokeNative_V_JJJJD
>>> invokeNative_V_JJJJJ
>>> 
>>> Where long arguments come before double arguments (and do this for 
>>> each arity e.g. <=5).
>>> 
>>> If all arguments are passed in register, then this reordering doesn't 
>>> affect behavior, and greatly limits the number of permutations to be 
>>> supported/generated.
>>> 
>>> The downcall part (java to native) is relative straightforward: the 
>>> directNativeInvoker.cpp file defines a bunch of native entry points, 
>>> one per shape, which cast the input address to a function pointer of 
>>> the desired shape, and then call it:
>>> 
>>> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong addr, 
>>> jlong arg0, jdouble arg1) {
>>>     return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
>>> }
>>> 
>>> The upcall business is a little trickier: first, if we are only to 
>>> optimize upcalls where argument passing happens in registers, then 
>>> it's crucial to note that by the time we get into the assembly stub, 
>>> all the registers will have been populated by the native code to 
>>> contain the right arguments in the right places. So we can avoid all 
>>> the shuffling in the assembly adapter and simply jump onto a C 
>>> function that looks like this:
>>> 
>>> long specialized_upcall_helper_J(long l0, long l1, long l2, long l3,
>>>                                       double d0, double d1, double 
>>> d2, double d3,
>>>                                        unsigned int mask, jobject 
>>> rec) { ... }
>>> 
>>> Note here that the first 8 arguments are just longs and doubles, and 
>>> those will be expected to be in registers, according to the System V 
>>> ABI. (In windows, the situation will be a bit different as less 
>>> integer registers are available, so this will need some work there).
>>> 
>>> So, to recap, the assembly upcall stub simply 'append' the receiver 
>>> object and a 'signature mask' in the last two available C registers 
>>> and then jump onto the helper function. The helper function will find 
>>> all the desired arguments in the right places - there will be, in the 
>>> general case, some unused arguments, but that's fine, after all it 
>>> didn't cost anything to us to load them in the first place!
>>> 
>>> Note that we have three helper variants, one for each return type { 
>>> long, double, void }. This is required as we need the C helper to 
>>> return a value of the right type which will generate the right 
>>> assembly sequence to store the result in the right register (either 
>>> integer or MMX).
>>> 
>>> So, with three helpers we can support all the shapes with up to 8 
>>> arguments. On the Java side we have, of course, to define a 
>>> specialized entry point for each shape.
>>> 
>>> All the magic for adapting method handle to and from the specialized 
>>> adapters happen in the DirectSignatureShuffler class; this class is 
>>> responsible for adapting each argument e.g. from Java to native 
>>> value, and then reordering the adapted method handle to match the 
>>> order in which arguments are expected by the adapter (e.g. move all 
>>> longs in front). The challenge was in having DirectSignatureShuffle 
>>> to be fully symmetric - e.g. I did not want to have different code 
>>> paths for upcalls and downcalls, so the code tries quite hard to be 
>>> parametric in the shuffling direction (java->native or native->java) 
>>> - which means that adapters will be applied in one way or in the 
>>> inverse way depending on the shuffling direction (and as to whether 
>>> we are adapting an argument or a return). Since method handle filters 
>>> are composable, it all works out quite beautifully.
>>> 
>>> Note that the resulting, adapted MH is stored in a @Stable field to 
>>> tell the JIT to optimize the heck out of it (as if it were a static 
>>> constant).
>>> 
>>> This patch contains several other changes - which I discuss briefly 
>>> below:
>>> 
>>> * we need to setup a framework in which new invocation strategies can 
>>> be plugged in - note that we now have essentially 4 cases:
>>> 
>>> { NativeInvoker, UpcallHandler } x { Universal, Direct }
>>> 
>>> When the code wants e.g. a NativeInvoker, it asks for one to the 
>>> NativeInvoker::of factory (UpcallHandler work in a similar way); this 
>>> factory will attempt to go down the fast path - if an error occurs 
>>> when computing the fast path, the call will fallback to the universal 
>>> (slow) path.
>>> 
>>> Most of the changes you see in the Java code are associated to this 
>>> refactoring - e.g. all clients of NativeInvoker/UpcallHandler should 
>>> now go through the factory
>>> 
>>> * CallbackImplGenerator had a major issue since the new factory for 
>>> NativeInvoker wants to bind an address eagerly (this is required e.g. 
>>> to be forward compatible with linkToNative backend); which means that 
>>> at construction time we have to get the address of the callback, call 
>>> the NativeInvoker factory and then stash the target method handle 
>>> into a field of the anon callback class. Vlad tells me that fields of 
>>> anon classes are always 'trusted' by the JIT, which means they should 
>>> be treated as '@Stable' (note that I can't put a @Stable annotation 
>>> there, since this code will be spinned in user-land).
>>> 
>>> * There are a bunch of properties that can be set to either force 
>>> slow path or force 'direct' path; in the latter case, if an error 
>>> occurs when instantiating the direct wrapper, an exception is thrown. 
>>> This mode is very useful for testing, and I indeed have tried to run 
>>> all our tests with this flag enabled, to see how many places could 
>>> not be optimized.
>>> 
>>> * I've also reorganized all the native code in hotspot/prims so that 
>>> we have a separate file for each scheme (and so that native Java 
>>> methods could be added where they really belong). This should also 
>>> help in the long run as it should make adding/removing a given scheme 
>>> easier.
>>> 
>>> * I've also added a small test which tries to pass structs of 
>>> different sizes, but I will also work on a more complex test which 
>>> will stress-test all invocation modes in a more complete fashion. 
>>> With respect to testing, I've also done a fastdebug build and ran all 
>>> tests with that (as fastdebug catches way many more hotspot assertion 
>>> than the product version); everything passed.
>>> 
>>> Webrev:
>>> 
>>> http://cr.openjdk.java.net/~mcimadamore/panama/8210757/
>>> 
>>> I'd like to thank Vladimir Ivanov for the prompt support whenever I 
>>> got stuck down the macro assembler rabbit hole :-)
>>> 
>>> Cheers
>>> Maurizio
>>> 
>>> [1] - 
>>> http://mail.openjdk.java.net/pipermail/panama-dev/2018-September/002652.html
>>> 
>>> 
>>> 
>>