[foreign] RFR 8210757: Add binder support for direct native invocation strategy
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Fri Sep 14 18:04:48 UTC 2018
Hi,
as mentioned in [1], this patch adds binder support for the so called
'direct' invocation scheme, which allows for greater native invocation
downcall/upcall performances by means of specialized adapters. The core
idea, also described in [1], is to define adapters of the kind:
invokeNative_V_DDDDD
invokeNative_V_JDDDD
invokeNative_V_JJDDD
invokeNative_V_JJJDD
invokeNative_V_JJJJD
invokeNative_V_JJJJJ
Where long arguments come before double arguments (and do this for each
arity e.g. <=5).
If all arguments are passed in register, then this reordering doesn't
affect behavior, and greatly limits the number of permutations to be
supported/generated.
The downcall part (java to native) is relative straightforward: the
directNativeInvoker.cpp file defines a bunch of native entry points, one
per shape, which cast the input address to a function pointer of the
desired shape, and then call it:
jlong NI_invokeNative_J_JD(JNIEnv *env, jobject _unused, jlong addr,
jlong arg0, jdouble arg1) {
return ((jlong (*)(jlong, jdouble))addr)(arg0, arg1);
}
The upcall business is a little trickier: first, if we are only to
optimize upcalls where argument passing happens in registers, then it's
crucial to note that by the time we get into the assembly stub, all the
registers will have been populated by the native code to contain the
right arguments in the right places. So we can avoid all the shuffling
in the assembly adapter and simply jump onto a C function that looks
like this:
long specialized_upcall_helper_J(long l0, long l1, long l2, long l3,
double d0, double d1, double d2,
double d3,
unsigned int mask, jobject rec)
{ ... }
Note here that the first 8 arguments are just longs and doubles, and
those will be expected to be in registers, according to the System V
ABI. (In windows, the situation will be a bit different as less integer
registers are available, so this will need some work there).
So, to recap, the assembly upcall stub simply 'append' the receiver
object and a 'signature mask' in the last two available C registers and
then jump onto the helper function. The helper function will find all
the desired arguments in the right places - there will be, in the
general case, some unused arguments, but that's fine, after all it
didn't cost anything to us to load them in the first place!
Note that we have three helper variants, one for each return type {
long, double, void }. This is required as we need the C helper to return
a value of the right type which will generate the right assembly
sequence to store the result in the right register (either integer or MMX).
So, with three helpers we can support all the shapes with up to 8
arguments. On the Java side we have, of course, to define a specialized
entry point for each shape.
All the magic for adapting method handle to and from the specialized
adapters happen in the DirectSignatureShuffler class; this class is
responsible for adapting each argument e.g. from Java to native value,
and then reordering the adapted method handle to match the order in
which arguments are expected by the adapter (e.g. move all longs in
front). The challenge was in having DirectSignatureShuffle to be fully
symmetric - e.g. I did not want to have different code paths for upcalls
and downcalls, so the code tries quite hard to be parametric in the
shuffling direction (java->native or native->java) - which means that
adapters will be applied in one way or in the inverse way depending on
the shuffling direction (and as to whether we are adapting an argument
or a return). Since method handle filters are composable, it all works
out quite beautifully.
Note that the resulting, adapted MH is stored in a @Stable field to tell
the JIT to optimize the heck out of it (as if it were a static constant).
This patch contains several other changes - which I discuss briefly below:
* we need to setup a framework in which new invocation strategies can be
plugged in - note that we now have essentially 4 cases:
{ NativeInvoker, UpcallHandler } x { Universal, Direct }
When the code wants e.g. a NativeInvoker, it asks for one to the
NativeInvoker::of factory (UpcallHandler work in a similar way); this
factory will attempt to go down the fast path - if an error occurs when
computing the fast path, the call will fallback to the universal (slow)
path.
Most of the changes you see in the Java code are associated to this
refactoring - e.g. all clients of NativeInvoker/UpcallHandler should now
go through the factory
* CallbackImplGenerator had a major issue since the new factory for
NativeInvoker wants to bind an address eagerly (this is required e.g. to
be forward compatible with linkToNative backend); which means that at
construction time we have to get the address of the callback, call the
NativeInvoker factory and then stash the target method handle into a
field of the anon callback class. Vlad tells me that fields of anon
classes are always 'trusted' by the JIT, which means they should be
treated as '@Stable' (note that I can't put a @Stable annotation there,
since this code will be spinned in user-land).
* There are a bunch of properties that can be set to either force slow
path or force 'direct' path; in the latter case, if an error occurs when
instantiating the direct wrapper, an exception is thrown. This mode is
very useful for testing, and I indeed have tried to run all our tests
with this flag enabled, to see how many places could not be optimized.
* I've also reorganized all the native code in hotspot/prims so that we
have a separate file for each scheme (and so that native Java methods
could be added where they really belong). This should also help in the
long run as it should make adding/removing a given scheme easier.
* I've also added a small test which tries to pass structs of different
sizes, but I will also work on a more complex test which will
stress-test all invocation modes in a more complete fashion. With
respect to testing, I've also done a fastdebug build and ran all tests
with that (as fastdebug catches way many more hotspot assertion than the
product version); everything passed.
Webrev:
http://cr.openjdk.java.net/~mcimadamore/panama/8210757/
I'd like to thank Vladimir Ivanov for the prompt support whenever I got
stuck down the macro assembler rabbit hole :-)
Cheers
Maurizio
[1] -
http://mail.openjdk.java.net/pipermail/panama-dev/2018-September/002652.html
More information about the panama-dev
mailing list