[foreign-abi] On invokers

Mon Sep 23 11:43:30 UTC 2019

Hi,

I've been looking into the current set of invokers we have on the 
foreign-abi branch for the past few weeks. There is still work to be 
done in this area, both in terms of performance, and in terms of 
programmability. In this email I will focus on the latter.

The UniversalNativeInvoker (UNI) API is currently the most programmble 
invoker that we have, so if we want to increase the programmability of 
our backend to cover more and more ABIs, this seems like a good place to 
start. UNI goes a ways in being programmable with the CallingSequence, 
ShuffleRecipe and ArgumentBinding APIs, being able to select in which 
registers to pass values, but there are still some aspects that could be 
polished:

1.) If you look into the VM code that processes the shuffle recipe, 
you'll notice that the eventual argument buffer that's being fed to the 
stub has a fixed set of registers it can work with on a given platform 
[1], namely the ones that are used by the C ABI. This works when we have 
only one ABI (C), but for different ABIs we'd probably want a different 
set of registers. We can change the stub generation code to take an 
'ABIDescriptor' from which we derive the stub and argument buffer layout 
instead. This will also provide a place to put other ABI details that 
need to be customized, like stack alignment, and argument shadow space 
(Windows), as well as a set of volatile registers, which will be a super 
set of the argument registers. We would end up generating 1 generic 
downcall stub for each ABI. Also, note that we would need to create 
architecture definitions on the Java side to be able to specify the 
ABIDescriptors there (since ABIs are defined in terms of architecture).

2.) There is a need to pass meta arguments to a function sometimes. For 
instance, we need to pass in a pointer to a return buffer for 
in-memory-returns, and e.g. on SysV we need to pass in the number of 
float arguments in RAX (or rather AL) for variadic functions. The former 
is handled automatically by CallingSequenceBuilder, and the latter is 
hard-coded in the VM code. Since these are both ABI details, I believe 
they should be handled by the ABI implementations. Ideally we'd have an 
invoker API that let's us say: "add a Java argument with this carrier 
type, and this MemoryLayout, and then shuffle it into this register.", 
and then the ABI implementation can handle the further adaptation from 
the ABI-level signature (e.g. an additional MemoryAddress passed in as 
first argument), to the C-level signature (allocate a buffer as first 
argument and also return it). This is mostly a refactoring move in 
UNI::invoke and CallingSequenceBuilder that removes the handling for in 
memory returns, and replaces it with a more general way of passing those 
kinds of arguments.

3.) The unboxing/boxing is currently handled by calling into the various 
ABI implementations. We can make this code shared by extending the 
current ArgumentBinding 'recipe' to include other operations, besides 
moving from a pointer to a register, that cover the things that are 
currently handled by the ABI boxing/unboxing implementations. The 
various CallingSequenceBuilder implementations can then specify these 
additional binding operations when generating bindings. This means that 
we only need one shared piece of code that interprets this 'binding 
recipe'. The other advantage of doing this is that we would eventually 
be able to use these binding recipes + ABIDescriptor to generate a 
specialized stub for a particular call site.

4.) We are currently shuffling the arguments for a down call into a 
long[], and then in the VM we shuffle the arguments from the long[] into 
an argument buffer (ShuffleDowncallContext). We can merge these steps 
together, by directly shuffling the arguments into an argument buffer on 
the Java side (since we have an off-heap API). This decreases the 
overall complexity of the invoker implementation significantly, since we 
can drop all the code relating to shuffle recipes.

I've been experimenting with these ideas, and have a prototype for 
downcalls on Windows [2]. For this I copied the relevant UNI classes to 
a separate `programmable` package and made the relevant changes there, 
since some of the code was shared with UniversalUpcallHandler. I've also 
preemptively removed the old UNI code (for x86) to show roughly how much 
code would be removed by switching to the new invoker API. I want to 
continue the experiment for upcalls as well, after which more old code 
could be removed; namely Argument, ArgumentBinding, CallingSequence 
(old), CallingSeqeunceBuilder (old), Storage, StorageClass, SharedUtils 
(mostly) and UniversalAdapter.

How do these ideas sound? I'm mostly interested if this is flexible 
enough to support AArch64 and SysV. After the upcall support, I can look 
into porting the other 2 ABIs as well.

Thanks,
Jorn

[1] : 
https://github.com/openjdk/panama/blob/foreign-abi/src/hotspot/cpu/x86/universalNativeInvoker_x86.cpp#L73
[2] : 
https://github.com/openjdk/panama/compare/foreign-abi...JornVernee:prog-back-no-old