Towards finalizing the linker implementation
Jorn Vernee
jorn.vernee at oracle.com
Tue Oct 5 11:17:53 UTC 2021
Hi,
Now that we are talking about finalizing the linker and memory access
APIs, I thought it would be good to talk about what I think still needs
to be done to finalize the implementation of the linker, mostly so that
potential porters know what to expect.
The linker has 2 main parts: downcall and upcall support. For both of
these there are currently 4 flavors. Both the downcall and upcall
support consists of a low-level and a high-level part. The low-level
part takes care of shuffling primitives back and forth between registers
in VM code, and the high-level part takes care of boxing and unboxing
these primitives into for instance MemoryAddresses, or MemorySegments.
The low-level part has 2 modes 'buffered' and 'optimized'. The
'optimized' mode is faster than the 'buffered' mode, but currently
doesn't support all kinds of functions. The high-level part also has 2
modes: 'interpreted' and 'specialized'. The 2 modes for each part makes
2 x 2 = 4 flavors. One important note is that the low-level part
requires implementations in the VM for each architecture, while the
high-level part is implemented completely in Java.
Two years ago, we thought that we only needed the buffered invocation
strategy for the low-level part, and C2 could handle the heavy-lifting
as far as the optimization was concerned, but this turned out to be
harder than thought, partly because of instruction scheduling issues,
and partly because current VM and GC code expect calls to native code to
go through an intermediate 'native wrapper' which has it's own frame. As
a result of this, the current implementation ended up with 2 pretty much
separate implementations of the low-level part of downcalls. This makes
maintenance and porting efforts harder, so I think we should ultimately
get rid of the 'buffered' invocation strategy, by adding the missing
support for certain function types (namely those that pass arguments on
the stack, and return arguments in multiple registers) to the
'optimized' mode, and then removing the buffered invocation strategy.
I have been working on a patch towards this goal. It makes some of the
work that C2 does for downcalls more eager (namely spinning the
mentioned 'native wrapper'), so that the 'optimized' mode for downcalls
can in the future replace the 'buffered' mode completely. As a side
effect of this, 'virtual calls', calls where the address of the target
function is passed in as an argument, become a lot faster, and support
for passing stack arguments is almost a matter of 'just turning it on'.
The 'specialized' mode of the high-level part for up/downcalls is
currently implemented completely using method handle combinators. I have
discussed this with Maurizio and, while using method handle combinators
can work, the code has reached a level of complexity where it has become
hard to maintain. We think switching from using method handle
combinators to using byte code spinning with ASM will make this code
easier to maintain (also because a lot more people are familiar with
ASM). The 'interpreted' mode is really simple, so there is probably no
need to remove that, for now.
I have the following timeline in mind:
1. Finish the patch I'm working on right now; moving the 'native
wrapper' generation to be more eager, and uncoupled from C2.
2. Switch from method handle combinators to ASM for the 'specialized' mode.
3. Implement stack argument and multi-register return support for both
downcalls and upcalls in the 'optimized' mode.
4. Bring the AArch64 port up to the same level (currently missing the
'optimized' mode for upcalls).
5. Remove the buffered invocation strategy.
After #3 is implemented, I think it would be a good time for porters
working on other platforms to start looking at this as well. They would
only need to implemented the 'optimized' modes of up/downcalls. Nick
Gasson from ARM has already done a stellar job so far with the AArch64
port, but at the time that they started, I don't think we anticipated
the amount of changes still needed to the VM code.
As for the timeline; none of these things are blockers for finalizing
the API, and could be implemented afterwards as well.
Cheers,
Jorn
More information about the panama-dev
mailing list