[foreign] RFR 8210757: Add binder support for direct native invocation strategy

Fri Sep 28 12:11:50 UTC 2018

No need for that, Vm treats all final fields on VM anon classes as 
@Stable (or so I've been told) :-)

Maurizio


On 28/09/18 13:06, Jorn Vernee wrote:
> Mostly out of curiosity; can you make the generated MethodHandle field 
> in CallbackImplGenerator @Stable as well?
>
> private void generateMethodHandleField(BinderClassWriter cw) {
>     cw.visitField(ACC_PRIVATE | ACC_FINAL, MH_FIELD_NAME, 
> Type.getDescriptor(MethodHandle.class), null, null)
>         .visitAnnotation(Type.getDescriptor(Stable.class), true);
> }
>
> Jorn
>
> Maurizio Cimadamore schreef op 2018-09-28 13:19:
>> Webrev:
>>
>> http://cr.openjdk.java.net/~mcimadamore/panama/8210757_v2/
>>
>> Maurizio
>>
>>
>> On 28/09/18 12:19, Maurizio Cimadamore wrote:
>>> This is an updated version of the direct invocation scheme support. 
>>> Very close to the last one, but there are some minor 
>>> refactorings/improvements:
>>>
>>> 1) Added a @Stable annotation in DirectNativeInvoker's MH field
>>> 2) box/unbox routine used by the UniversalXYZ strategies have been 
>>> moved from NativeInvoker to UniversalNativeInvoker
>>> 3) I revamped the logic which detects whether fastpath is applicable 
>>> - now we create the calling sequence first, and we use that to check 
>>> whether we can fast path it. Some internal benchmark have shown that 
>>> with a large number of symbols, we were doing a lot of work because 
>>> we were trying the fastpath always and then, in case of exception 
>>> fallback to slow path; in such cases we would create calling 
>>> sequence twice. This new technique might also be more friendly 
>>> w.r.t. Windows and other ABIs.
>>>
>>> I'd really like to move ahead with this (as this RFR has been out 
>>> for quite a while now) - if there's no other comments I'll go ahead.
>>>
>>> Maurizio
>>>
>>>
>>> On 14/09/18 19:04, Maurizio Cimadamore wrote:
>>>> Hi,
>>>> as mentioned in [1], this patch adds binder support for the so 
>>>> called 'direct' invocation scheme, which allows for greater native 
>>>> invocation downcall/upcall performances by means of specialized 
>>>> adapters. The core idea, also described in [1], is to define 
>>>> adapters of the kind:
>>>>
>>>> invokeNative_V_DDDDD
>>>> invokeNative_V_JDDDD
>>>> invokeNative_V_JJDDD
>>>> invokeNative_V_JJJDD
>>>> invokeNative_V_JJJJD
>>>> invokeNative_V_JJJJJ
>>>>
>>>> Where long arguments come before double arguments (and do this for 
>>>> each arity e.g. <=5).
>>>>
>>>> If all arguments are passed in register, then this reordering 
>>>> doesn't affect behavior, and greatly limits the number of 
>>>> permutations to be supported/generated.
>>>>
>>>> The downcall part (java to native) is relative straightforward: the 
>>>> directNativeInvoker.cpp file defines a bunch of native entry 
>>>> points, one per shape, which cast the input address to a function 
>>>> pointer of the desired shape, and then call it:
>>>>
>>>> jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong 
>>>> addr, jlong arg0, jdouble arg1) {
>>>>     return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
>>>> }
>>>>
>>>> The upcall business is a little trickier: first, if we are only to 
>>>> optimize upcalls where argument passing happens in registers, then 
>>>> it's crucial to note that by the time we get into the assembly 
>>>> stub, all the registers will have been populated by the native code 
>>>> to contain the right arguments in the right places. So we can avoid 
>>>> all the shuffling in the assembly adapter and simply jump onto a C 
>>>> function that looks like this:
>>>>
>>>> long specialized_upcall_helper_J(long l0, long l1, long l2, long l3,
>>>>                                       double d0, double d1, double 
>>>> d2, double d3,
>>>>                                        unsigned int mask, jobject 
>>>> rec) { ... }
>>>>
>>>> Note here that the first 8 arguments are just longs and doubles, 
>>>> and those will be expected to be in registers, according to the 
>>>> System V ABI. (In windows, the situation will be a bit different as 
>>>> less integer registers are available, so this will need some work 
>>>> there).
>>>>
>>>> So, to recap, the assembly upcall stub simply 'append' the receiver 
>>>> object and a 'signature mask' in the last two available C registers 
>>>> and then jump onto the helper function. The helper function will 
>>>> find all the desired arguments in the right places - there will be, 
>>>> in the general case, some unused arguments, but that's fine, after 
>>>> all it didn't cost anything to us to load them in the first place!
>>>>
>>>> Note that we have three helper variants, one for each return type { 
>>>> long, double, void }. This is required as we need the C helper to 
>>>> return a value of the right type which will generate the right 
>>>> assembly sequence to store the result in the right register (either 
>>>> integer or MMX).
>>>>
>>>> So, with three helpers we can support all the shapes with up to 8 
>>>> arguments. On the Java side we have, of course, to define a 
>>>> specialized entry point for each shape.
>>>>
>>>> All the magic for adapting method handle to and from the 
>>>> specialized adapters happen in the DirectSignatureShuffler class; 
>>>> this class is responsible for adapting each argument e.g. from Java 
>>>> to native value, and then reordering the adapted method handle to 
>>>> match the order in which arguments are expected by the adapter 
>>>> (e.g. move all longs in front). The challenge was in having 
>>>> DirectSignatureShuffle to be fully symmetric - e.g. I did not want 
>>>> to have different code paths for upcalls and downcalls, so the code 
>>>> tries quite hard to be parametric in the shuffling direction 
>>>> (java->native or native->java) - which means that adapters will be 
>>>> applied in one way or in the inverse way depending on the shuffling 
>>>> direction (and as to whether we are adapting an argument or a 
>>>> return). Since method handle filters are composable, it all works 
>>>> out quite beautifully.
>>>>
>>>> Note that the resulting, adapted MH is stored in a @Stable field to 
>>>> tell the JIT to optimize the heck out of it (as if it were a static 
>>>> constant).
>>>>
>>>> This patch contains several other changes - which I discuss briefly 
>>>> below:
>>>>
>>>> * we need to setup a framework in which new invocation strategies 
>>>> can be plugged in - note that we now have essentially 4 cases:
>>>>
>>>> { NativeInvoker, UpcallHandler } x { Universal, Direct }
>>>>
>>>> When the code wants e.g. a NativeInvoker, it asks for one to the 
>>>> NativeInvoker::of factory (UpcallHandler work in a similar way); 
>>>> this factory will attempt to go down the fast path - if an error 
>>>> occurs when computing the fast path, the call will fallback to the 
>>>> universal (slow) path.
>>>>
>>>> Most of the changes you see in the Java code are associated to this 
>>>> refactoring - e.g. all clients of NativeInvoker/UpcallHandler 
>>>> should now go through the factory
>>>>
>>>> * CallbackImplGenerator had a major issue since the new factory for 
>>>> NativeInvoker wants to bind an address eagerly (this is required 
>>>> e.g. to be forward compatible with linkToNative backend); which 
>>>> means that at construction time we have to get the address of the 
>>>> callback, call the NativeInvoker factory and then stash the target 
>>>> method handle into a field of the anon callback class. Vlad tells 
>>>> me that fields of anon classes are always 'trusted' by the JIT, 
>>>> which means they should be treated as '@Stable' (note that I can't 
>>>> put a @Stable annotation there, since this code will be spinned in 
>>>> user-land).
>>>>
>>>> * There are a bunch of properties that can be set to either force 
>>>> slow path or force 'direct' path; in the latter case, if an error 
>>>> occurs when instantiating the direct wrapper, an exception is 
>>>> thrown. This mode is very useful for testing, and I indeed have 
>>>> tried to run all our tests with this flag enabled, to see how many 
>>>> places could not be optimized.
>>>>
>>>> * I've also reorganized all the native code in hotspot/prims so 
>>>> that we have a separate file for each scheme (and so that native 
>>>> Java methods could be added where they really belong). This should 
>>>> also help in the long run as it should make adding/removing a given 
>>>> scheme easier.
>>>>
>>>> * I've also added a small test which tries to pass structs of 
>>>> different sizes, but I will also work on a more complex test which 
>>>> will stress-test all invocation modes in a more complete fashion. 
>>>> With respect to testing, I've also done a fastdebug build and ran 
>>>> all tests with that (as fastdebug catches way many more hotspot 
>>>> assertion than the product version); everything passed.
>>>>
>>>> Webrev:
>>>>
>>>> http://cr.openjdk.java.net/~mcimadamore/panama/8210757/
>>>>
>>>> I'd like to thank Vladimir Ivanov for the prompt support whenever I 
>>>> got stuck down the macro assembler rabbit hole :-)
>>>>
>>>> Cheers
>>>> Maurizio
>>>>
>>>> [1] - 
>>>> http://mail.openjdk.java.net/pipermail/panama-dev/2018-September/002652.html
>>>>
>>>>
>>>>
>>>