The last miles

Mon Aug 21 18:39:23 UTC 2023

----- Original Message -----
> From: "daniel smith" <daniel.smith at oracle.com>
> To: "John Rose" <john.r.rose at oracle.com>
> Cc: "Remi Forax" <forax at univ-mlv.fr>, "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Monday, August 21, 2023 7:39:03 PM
> Subject: Re: The last miles

>> On Aug 18, 2023, at 9:15 PM, John Rose <john.r.rose at oracle.com> wrote:
>> 
>> I’ve written up in detail how I think Remi’s suggestion can work.
>> 
>> https://cr.openjdk.org/~jrose/values/larval-values.html
>> 
>> While this is a rough note, I think all the details are present.
> 
> The compatibility wins of this strategy do seem nice. But let me scrutinize a
> few details, because I think there are some trade-offs:
> 
> 1) A larval value object is an identity object. This means, in the hand-off
> between the <init> method and the caller, the object must be heap allocated:
> the caller and the <init> method need an agreed-upon memory location where
> state will be set up.
> 
> I can see this being optimized away if the <init> method can be inlined. But if
> not (e.g., the constructor logic is sufficiently complex/large), that's a new
> cost for value object creation: every 'new' needs a heap allocation. (For
> <vnew>, we are able to optimize the return value calling convention without
> needing inlining.)

A larval value object is a value object with the larval state to "on", it's not an identity object.
At the end of the call of <init>, the larval state is set to "off".
The larval bit controls if putfield is allowed or not.

In the interpreter, a larval value object is buffered, so there is an heap allocation.
But in JITed code, if everything is inlined, the larval bit does not even need to be set because the JIT can prove that the value object is in larval state when a putfield occurs. If not everything is inlined, by example if <init> is compiled as one method, the JITed code has to check the larval bit.

I will let the others answer the other questions.

Rémi

> 
> Am I understanding this correctly? How concerned should we be about the extra
> allocation cost? (Our working principle to this point has been that optimal
> compiled code should have zero heap allocations.)
> 
> 2) If we *do* inline the <init> call, then at the call site, there can be any
> number of references from locals/stack to the larval value, and at the end of
> the call, there's this unusual operation where all of those locals/stack get
> transformed into the value object. I *think* this all just falls out cleanly
> (locals become compiler metadata that bottoms out at the same registers, no
> matter how many references there are), but it's something to think carefully
> about.
> 
> 3) The <vnew> approach doesn't have any constraints about leaking 'this', and in
> particular the javac rule we were envisioning is that the constructor can't
> leak 'this' until all fields are provably set, but aftewards it's fair game.
> This <init> strategy is stricter: the verifier disallows leaking 'this' at all
> from any point in the constructor.
> 
> Are we okay with these restrictions? In practice, this is most likely to trip up
> people trying to do instance method calls, plus those who are doing things like
> keeping track of constructed objects. (Even printf logging seems tricky, since
> 'toString' is off limits.)
> 
> 4) I'm not sure the prohibition on 'super' calls is actually necessary. What if,
> instead, all non-'identity' <init> methods are understood to be working on
> larval objects, and prohibited from any leaking of 'this'? Instead of
> disallowing 'super' calls, the verifier would only transition from
> 'uninitializedThis' to 'LFoo;' in an identity class constructor. Does that make
> sense or am I missing something? (If it works, does this mean we get support
> for super fields "for free"?)
> 
> 5) Do we really need a header state for larval objects? We don't do anything
> like that to distinguish between uninitialized identity objects (post-'new')
> and valid identity objects (post-'super()'). We just let the verifier handle
> it. Same principle here perhaps?