Limitations of the Calling Convention Optimization

Fri Oct 23 13:40:37 UTC 2020

----- Mail original -----
> De: "Tobias Hartmann" <tobias.hartmann at oracle.com>
> À: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "valhalla-dev" <valhalla-dev at openjdk.java.net>
> Envoyé: Vendredi 23 Octobre 2020 08:51:11
> Objet: Re: Limitations of the Calling Convention Optimization

> On 22.10.20 20:02, forax at univ-mlv.fr wrote:
>>> Sure but my point is that tweaking the inlining heuristic is far from trivial
>>> and even inline type
>>> specific tweaks will have unforeseeable side effects on code not using inline
>>> types.
>> 
>> ??,
>> i fail to see how it can affect code that not have been written yet.
> 
> You suggested to prefer Q-typed methods in the inlining heuristic which will in
> turn have a negative
> effect on inlining of other methods that don't use Q-types (given that we only
> have a limited
> inlining budget). I.e., just having one little Q-typed method somewhere in the
> call stack could in
> the worst case negatively affect the performance because suddenly that method is
> preferred over
> another method.
> 
> But let's not go too much into detail here, my point is that inlining should not
> be improved as part
> of this project and certainly not by adding tweaks for every new feature that
> would benefit from
> special treatment by the heuristic.

Ok !

> 
>> c1 doesn't have to buffer because it can do on stack allocation (at least for
>> small inline object).
>> With that, you may be able to scalarize method calls in c1.
> 
> No, C1 can't do stack allocation because we don't support stack allocation
> anywhere in the VM. It
> would be a completely new feature. What we do for inline types in C2 is
> *scalarization* (pass field
> values individually in registers or on the stack) but C1 does not support that
> either.

Technically, if you look one layer below, the register allocator of c1 is able to do register spilling, which is a kind of stack allocation with more constraints.
Hence my proposal to have a special type to represent an inline object has a value that always end up on stack.

> 
> Microsoft recently proposed to add this feature but only for the C2 compiler:
> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-July/042419.html
> 
> There are no plans to ever add stack allocation support to C1.
> 
>> I was suggesting to see inline objects as part of the stack (like double are
>> using two slots) for small inline object.
>> This can be don by c1 and maybe by the interpreter if you can rewrite the
>> bytecode.
> 
> I think you are confusing the Java stack with the native stack. There is no 1:1
> mapping between the
> two and the layout of the Java stack does not directly affect the layout of the
> native stack as used
> by C1/C2 compiled code. I.e., a bytecode transformation to load/store inline
> types as "vectors" does
> not magically implement stack allocation.

yes, that's more of less what i'm trying to say, what is needed to put inline object on stack for c1 is a kind of restricted form of stack allocation.
It's like stack allocation where you will never send the address of the inline object on stack.

So it's quite different from what Microsoft proposes for c2.

> 
>>> First, it's not necessary to do any kind of bytecode transformations here to
>>> help the JIT.
>> 
>> yes, for c2, but it can help for c1 and the interpreter. And if small inline
>> objects are stack allocated, they can use a scalarized calling convention.
> 
> Yes, assuming that we have support for stack allocation but even then, similar
> to a thread local
> buffer, we would still need additional runtime checks (see below).

If you never send the address of the stack or the address of a page that grows has the stack, there is no additional runtime check.

> 
>> I believe c1 can use two different types to represent stack allocated inline
>> objects and heap allocated inline objects,
>> avoiding to heap allocate if not necessary but also avoiding to stack allocate
>> something that will escape into a reference too.
>> 
>> If you are using two different types, you should not have to do runtime check
>> unlike when the runtime was doing buffering.
> 
> I was referring to the runtime check that is required to check if a reference is
> referring to a
> stack allocated object. If these are passed/returned over method boundaries, we
> need to check before
> storing them into any container (because stack allocated objects would require
> re-allocation on the
> heap). For example, the callee does not know if the caller passes a stack or
> heap allocated object.

In you can not send the address of a stack allocated object, then there is no runtime check.

> 
>>> Sure but then we still have the issue of not being able to represent 'null' in
>>> the flat representation (i.e. we are back to that discussion).
>> 
>> Null check speculation may help here.
>> It's what i'm doing to do primitive speculation (for a dynamically typed
>> language) but it's a speculative optimization and it requires to have at least
>> two calling convention,
>> the boxed one when you send a reference and the stack allocated one where you
>> copy the primitive object.
> 
> We can speculate but the question is how do we handle null if it still shows
> ups? The easiest way
> would be to just deoptimize but then performance will suck whenever null is
> passed.

yep,
in the case of Maurizio, the interface acts as an opaque type because as a user, you can not directly access to the implementation of the interface,
so it's safe to assume that if a user uses null it will hit a requireNonNull soon.

> 
> For handling null, it's not sufficient to have two calling conventions (which we
> already have). You
> would need to have two complete versions of the method. For example:
> 
> void int foo(MyInterface bar) {
>    if (bar == null) return 42;
>    return bar.getX();
> }
> 
> If we now speculate that MyInterface is only implemented by MyValue and 'bar' is
> never null, this
> will be optimized by C2 to:
> 
> void int foo(int X) {
>    return X;
> }
> 
> Now for that method we already have two calling conventions:
> 1) bar is passed as reference and then "unpacked" because the method body only
> works on X
> 2) bar is passed as field X (i.e. only an integer is passed)
> 
> But above code does not support a 'null' being passed. To handle null, we would
> need to have a
> second version of the same method where the method body handles null:
> 
> void int foo(MyValue bar) {
>    if (bar == null) return 42;
>    return bar.x;
> }

I believe the code will be more like
void int foo(MyInterface bar) {
   requireNonNull(bar);
   return bar.getX();
}

so it worth to aggressively speculate that bar is never null and deopt if necessary.

[...]

> 
> Best regards,
> Tobias

Rémi