Limitations of the Calling Convention Optimization

Wed Oct 21 13:48:10 UTC 2020

Tobias;

Thanks for these great notes.  This project is, in many ways, like 
boring the Channel Tunnel; there's a certain amount of "start drilling 
from each side, and pray that you meet in the middle."  The calling 
convention optimizations represent one direction of effort; the attempt 
to use Valhalla in libraries like Panama represent the other.  But, 
unlike our tunnel-boring brethren, we didn't expect that we'd meet on 
our first attempt, and that's OK.

What I recommend here is that we do a similar analysis from the other 
side -- the API idioms we are trying to use -- and see if there is 
something missing to bridge the gap.  It may be that we can identify a 
restricted set of idioms, that meet the needs of libraries, that can be 
optimized better.

My hope was that sealing could be part of the story for (3), since if 
the classfile tells you "permits X", you don't have to speculate on X -- 
you can bank on it.  But it sounds like that's only part of the story -- 
we also need to wrangle nullity constraints.

On 10/21/2020 8:02 AM, Tobias Hartmann wrote:
> Hi,
>
> After having a discussion with Maurizio who was observing some unexpected allocations when using
> inline types in the Foreign-Memory Access API (Panama), I've decided to summarize the limitations of
> the calling convention optimization. This is to make sure we share the same expectations.
>
> Inline types are passed/returned in a scalarized (flat) representation. That means that instead of
> passing/returning a pointer to a buffer, each field of an inline type is passed/returned
> individually in registers or on the stack. Only C2 compiled code uses this scalarized calling
> convention because C1 and the interpreter are always using buffers to access inline types. This adds
> some major complexity to the implementation because we need to handle mismatches in the calling
> convention between the interpreter, C1 and C2. The technical details are explained here [1].
>
> Now this optimization only applies to "sharp" inline type arguments and returns and does *not* apply
> to the reference projection or interface types (even if they are sealed and/or only implemented by
> one inline type). For example, the 'MyInline' return value of 'method' is buffered and returned as a
> pointer to that buffer because the return type is 'MyInterface':
>
> interface MyInterface {
>    [...]
> }
>
> inline class MyInline implements MyInterface {
>    [...]
> }
>
> static MyInterface method() {
>      return new MyInline();
> }
>
> Now it's important to understand that "buffering" an inline type means a full-blown Java heap
> allocation with all the expected impact on the GC and footprint. We often referred to these as
> "lightweight boxes" which might be a bit misleading because the impact on performance and footprint
> is the same as for a "heavyweight box". The difference is that we don't need to keep track of
> identity (they are not Java objects) and can therefore create/destroy such "boxes" on the fly. Of
> course, this is a limitation of the HotSpot implementation. Other implementations might choose to
> allocate on the stack or in a thread local buffer (see below).
>
> Also, buffering is not specific to the calling convention optimization but also required in other
> cases (for example, when storing an inline type into a non-flattened container).
>
> It's also important to understand that the calling convention optimization is highly dependent on
> inlining decisions. If we successfully inline a call, no buffering will be required. Unfortunately,
> for a Java developer it's very hard to track and control inlining.
>
> Possible solutions/mitigations:
> 1) Improve inlining.
> 2) Make buffering more light-weight.
> 3) Speculate that an interface is only implemented by a single inline type.
>
> 1) Improving the inlining heuristic is a long standing issue with C2 that is independent of inline
> types and an entire project on its own. Of course, we could tweak the current implementation such
> that problematic calls are more likely to be inlined but that will still be limited and might have
> side effects.
>
> 2) We've already evaluated options to make buffering more light-weight in the past. For example,
> thread-local value buffering [2] turned out to not improve performance as expected while adding lots
> of complexity and required costly runtime checks. And even if we buffer inline types outside of the
> Java heap, the GC still needs to know about object fields.
>
> 3) In above example, we could speculate that 'MyInterface' is only implemented by 'MyInline' (or
> maybe use the fact that 'MyInterface' is sealed). However, even in that case we would still need to
> handle a 'null' value. I.e., we are back to the discussion of flattening "nullable" inline types.
>
> One option to scalarize nullable inline types in the calling convention would be to pass an
> additional, artificial field that can be used to check if the inline type is null. Compiled code
> would then "null-check" before using the fields. However, this solution is far from trivial to
> implement and the overhead of the additional fields and especially the runtime checks might cancel
> out the improvements of scalarization.
>
> It's even worse if the interface is then suddenly implemented by a different, non-inline type. We
> would need to re-compile all dependent code resulting in a deoptimization-storm and re-compute the
> calling convention (something that is not easily possible with the current implementation).
>
> Also, the VM would need to ensure that the argument/return type is eagerly loaded when the adapters
> are created at method link time (how do we even know eager loading is required for these L-types?).
>
> Of course, we can push these limits as far as we can but the reality is that when mixing inline
> types with objects or interfaces, there will always be "boundaries" at which we have to buffer and
> this will lead to "unexpected" drops in performance.
>
> Hope that helps.
>
> Best regards,
> Tobias
>
> [1] https://mail.openjdk.java.net/pipermail/valhalla-dev/2019-December/006668.html
> [2] https://bugs.openjdk.java.net/browse/JDK-8212245