[foreign] RFR 8210757: Add binder support for direct native invocation strategy

Fri Sep 28 19:39:01 UTC 2018

Looks good to me overall, although I only focus on java side and not dig into assembly. 

A really minor thing in the new RegisterStructTest.java, line 68 seems to be redundant?

 68             checkEquals(rs.l$get(), rs.l$get());

The test passes on Mac.

Cheers,
Henry

> On Sep 28, 2018, at 4:19 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com> wrote:
> 
> Webrev:
> 
> http://cr.openjdk.java.net/~mcimadamore/panama/8210757_v2/
> 
> Maurizio
> 
> 
> On 28/09/18 12:19, Maurizio Cimadamore wrote:
>> This is an updated version of the direct invocation scheme support. Very close to the last one, but there are some minor refactorings/improvements:
>> 
>> 1) Added a @Stable annotation in DirectNativeInvoker's MH field
>> 2) box/unbox routine used by the UniversalXYZ strategies have been moved from NativeInvoker to UniversalNativeInvoker
>> 3) I revamped the logic which detects whether fastpath is applicable - now we create the calling sequence first, and we use that to check whether we can fast path it. Some internal benchmark have shown that with a large number of symbols, we were doing a lot of work because we were trying the fastpath always and then, in case of exception fallback to slow path; in such cases we would create calling sequence twice. This new technique might also be more friendly w.r.t. Windows and other ABIs.
>> 
>> I'd really like to move ahead with this (as this RFR has been out for quite a while now) - if there's no other comments I'll go ahead.
>> 
>> Maurizio
>> 
>> 
>> On 14/09/18 19:04, Maurizio Cimadamore wrote:
>>> Hi,
>>> as mentioned in [1], this patch adds binder support for the so called 'direct' invocation scheme, which allows for greater native invocation downcall/upcall performances by means of specialized adapters. The core idea, also described in [1], is to define adapters of the kind:
>>> 
>>> invokeNative_V_DDDDD
>>> invokeNative_V_JDDDD
>>> invokeNative_V_JJDDD
>>> invokeNative_V_JJJDD
>>> invokeNative_V_JJJJD
>>> invokeNative_V_JJJJJ
>>> 
>>> Where long arguments come before double arguments (and do this for each arity e.g. <=5).
>>> 
>>> If all arguments are passed in register, then this reordering doesn't affect behavior, and greatly limits the number of permutations to be supported/generated.
>>> 
>>> The downcall part (java to native) is relative straightforward: the directNativeInvoker.cpp file defines a bunch of native entry points, one per shape, which cast the input address to a function pointer of the desired shape, and then call it:
>>> 
>>> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong addr, jlong arg0, jdouble arg1) {
>>>     return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
>>> }
>>> 
>>> The upcall business is a little trickier: first, if we are only to optimize upcalls where argument passing happens in registers, then it's crucial to note that by the time we get into the assembly stub, all the registers will have been populated by the native code to contain the right arguments in the right places. So we can avoid all the shuffling in the assembly adapter and simply jump onto a C function that looks like this:
>>> 
>>> long specialized_upcall_helper_J(long l0, long l1, long l2, long l3,
>>>                                       double d0, double d1, double d2, double d3,
>>>                                        unsigned int mask, jobject rec) { ... }
>>> 
>>> Note here that the first 8 arguments are just longs and doubles, and those will be expected to be in registers, according to the System V ABI. (In windows, the situation will be a bit different as less integer registers are available, so this will need some work there).
>>> 
>>> So, to recap, the assembly upcall stub simply 'append' the receiver object and a 'signature mask' in the last two available C registers and then jump onto the helper function. The helper function will find all the desired arguments in the right places - there will be, in the general case, some unused arguments, but that's fine, after all it didn't cost anything to us to load them in the first place!
>>> 
>>> Note that we have three helper variants, one for each return type { long, double, void }. This is required as we need the C helper to return a value of the right type which will generate the right assembly sequence to store the result in the right register (either integer or MMX).
>>> 
>>> So, with three helpers we can support all the shapes with up to 8 arguments. On the Java side we have, of course, to define a specialized entry point for each shape.
>>> 
>>> All the magic for adapting method handle to and from the specialized adapters happen in the DirectSignatureShuffler class; this class is responsible for adapting each argument e.g. from Java to native value, and then reordering the adapted method handle to match the order in which arguments are expected by the adapter (e.g. move all longs in front). The challenge was in having DirectSignatureShuffle to be fully symmetric - e.g. I did not want to have different code paths for upcalls and downcalls, so the code tries quite hard to be parametric in the shuffling direction (java->native or native->java) - which means that adapters will be applied in one way or in the inverse way depending on the shuffling direction (and as to whether we are adapting an argument or a return). Since method handle filters are composable, it all works out quite beautifully.
>>> 
>>> Note that the resulting, adapted MH is stored in a @Stable field to tell the JIT to optimize the heck out of it (as if it were a static constant).
>>> 
>>> This patch contains several other changes - which I discuss briefly below:
>>> 
>>> * we need to setup a framework in which new invocation strategies can be plugged in - note that we now have essentially 4 cases:
>>> 
>>> { NativeInvoker, UpcallHandler } x { Universal, Direct }
>>> 
>>> When the code wants e.g. a NativeInvoker, it asks for one to the NativeInvoker::of factory (UpcallHandler work in a similar way); this factory will attempt to go down the fast path - if an error occurs when computing the fast path, the call will fallback to the universal (slow) path.
>>> 
>>> Most of the changes you see in the Java code are associated to this refactoring - e.g. all clients of NativeInvoker/UpcallHandler should now go through the factory
>>> 
>>> * CallbackImplGenerator had a major issue since the new factory for NativeInvoker wants to bind an address eagerly (this is required e.g. to be forward compatible with linkToNative backend); which means that at construction time we have to get the address of the callback, call the NativeInvoker factory and then stash the target method handle into a field of the anon callback class. Vlad tells me that fields of anon classes are always 'trusted' by the JIT, which means they should be treated as '@Stable' (note that I can't put a @Stable annotation there, since this code will be spinned in user-land).
>>> 
>>> * There are a bunch of properties that can be set to either force slow path or force 'direct' path; in the latter case, if an error occurs when instantiating the direct wrapper, an exception is thrown. This mode is very useful for testing, and I indeed have tried to run all our tests with this flag enabled, to see how many places could not be optimized.
>>> 
>>> * I've also reorganized all the native code in hotspot/prims so that we have a separate file for each scheme (and so that native Java methods could be added where they really belong). This should also help in the long run as it should make adding/removing a given scheme easier.
>>> 
>>> * I've also added a small test which tries to pass structs of different sizes, but I will also work on a more complex test which will stress-test all invocation modes in a more complete fashion. With respect to testing, I've also done a fastdebug build and ran all tests with that (as fastdebug catches way many more hotspot assertion than the product version); everything passed.
>>> 
>>> Webrev:
>>> 
>>> http://cr.openjdk.java.net/~mcimadamore/panama/8210757/
>>> 
>>> I'd like to thank Vladimir Ivanov for the prompt support whenever I got stuck down the macro assembler rabbit hole :-)
>>> 
>>> Cheers
>>> Maurizio
>>> 
>>> [1] - http://mail.openjdk.java.net/pipermail/panama-dev/2018-September/002652.html
>>> 
>>> 
>>> 
>> 
>