The last miles

Dan Smith daniel.smith at oracle.com
Tue Aug 22 18:03:11 UTC 2023



> On Aug 22, 2023, at 8:31 AM, John Rose <john.r.rose at oracle.com> wrote:
> 
> On 21 Aug 2023, at 10:39, Dan Smith wrote:
> 
>>> On Aug 18, 2023, at 9:15 PM, John Rose <john.r.rose at oracle.com> wrote:
>>> 
>>> I’ve written up in detail how I think Remi’s suggestion can work.
>>> 
>>> https://cr.openjdk.org/~jrose/values/larval-values.html
>>> 
>>> While this is a rough note, I think all the details are present.
>> 
>> The compatibility wins of this strategy do seem nice. But let me scrutinize a few details, because I think there are some trade-offs:
>> 
>> 1) A larval value object is an identity object. This means, in the hand-off between the <init> method and the caller, the object must be heap allocated: the caller and the <init> method need an agreed-upon memory location where state will be set up.
>> 
>> I can see this being optimized away if the <init> method can be inlined. But if not (e.g., the constructor logic is sufficiently complex/large), that's a new cost for value object creation: every 'new' needs a heap allocation. (For <vnew>, we are able to optimize the return value calling convention without needing inlining.)
>> 
>> Am I understanding this correctly? How concerned should we be about the extra allocation cost? (Our working principle to this point has been that optimal compiled code should have zero heap allocations.)
> 
> Optimal compiled code can still have this feature, if we choose.
> 
> We can direct the compiled version of <init> (for a value class only)
> to alter its calling sequence, dropping the input and returning the
> value.  Compiled calls to this guy would omit the input.  The
> interpreter adapter for it would adjust the discrepancy.  There are
> a number of ways to do this, in detail.

I would worry about the complexity of such an optimization (the optimized calling convention bears little resemblance to the original, and there needs to be some novel encoding of 'uninitialized' at the call site to express the promise of a value object to be computed later and stored in n different locals/stack positions).

Another thing that could be done is to have a lightweight on-stack encoding of "larval value object" that could be passed by reference and mutated by an <init> method, but without the overhead of a full heap object. New encodings mean new complexity, but maybe this one would be worth it.

Or maybe you're right, no need to worry about this corner case, inlining will be fine...

> (But, also, if the <init> method is complex enough to fail to inline,
> we probably won’t notice the extra cost of a buffered input.

Yeah, that's fair. I guess the worrying case would be where the existing <vnew> has a high computation time cost but zero memory impact; the <init> strategy would be bad on both dimensions.

> Note that all of these worries only apply to value classes
> with non-deprecated constructors.  New code will use factory methods,
> which doesn’t need to suffer from failed inlines, again because of
> an adjusted heuristic, if we need it.

There's no particular reason that new code would favor factories instead. (At least, there doesn't need to be. This compilation strategy makes it even easier for us to say in the language "almost nothing about constructors has changed, carry on as you have before.")

But it's true that, in a performance-sensitive application, an expensive constructor could be rewritten as a static factory with a private constructor that just sets the fields. And the calling convention for that factory will support scalarization. Such a refactoring shrinks the lifespan of the problematic larval object to the point that inlining & eliminating it should be trivial.

My takeaway is just that we should be cautious here: where before we had a guarantee of no new allocations from value class constructors (modulo some size threshold), now we're in the fuzzy territory of "if everything shakes out okay, you shouldn't notice any impact". This may be fine, but we'll want to keep an eye on it.

>> 3) The <vnew> approach doesn't have any constraints about leaking 'this', and in particular the javac rule we were envisioning is that the constructor can't leak 'this' until all fields are provably set, but aftewards it's fair game. This <init> strategy is stricter: the verifier disallows leaking 'this' at all from any point in the constructor.
> 
> Yes, easy leaking is a feature of <vnew>; the value is always ready.
> (This also means the interpreter has to create a new buffer on every
> state change.  I don’t care much about interpreter performance, but
> I think the <init> version of things performs fewer allocations.)
> 
>> Are we okay with these restrictions? In practice, this is most likely to trip up people trying to do instance method calls, plus those who are doing things like keeping track of constructed objects. (Even printf logging seems tricky, since 'toString' is off limits.)
> 
> If we wish to allow the super call after all, it can serve as the freeze
> point within the constructor.  It is still the case that the freeze must
> be performed before the value is usable as an adult, and there is no way
> to perform “late” putfields after the freeze.

Yeah, you've got me thinking that maybe a rule that says you can set fields before 'super()' but not after would be good enough. (With a language change that says in a value class, the implicit 'super()' call happens at the end rather than the start. If you want to write any post-super() code, you'll need an explicit super call.)

That sort of bottom-to-top initialization strategy is a change from tradition, but maybe we're mostly equipped to handle it already? (Thanks, JEP 447!)

> If the language wishes to fully implement “late” putfields

No. I don't think publishing 'this' before all value object fields are set is on the table.

>> 4) I'm not sure the prohibition on 'super' calls is actually necessary.
> 
> No, but it’s a move of economy.  Defining the meaning of super for
> values would be extra work.  We could do that; I’d prefer not to.
> Remember that super-constructors for values are already very special
> animals:  They must be empty in a special sense.  Forbidding calls to
> them seems like the clean move.

The rule is that super constructors must be empty because we had no concept of mutable state to communicate changes from parent to child. But now that we have larval objects...

Concretely, what if:

- putfield is a verifier error on non-identity class types, it only works on uninitializedThis
- as usual, every <init> method (for all kinds of classes) must do a super-<init> invokespecial (or this-<init>? still thinking about that)

Then:

- value objects get built bottom-to-top, with fields set before a super() call, and freedom to use 'this' afterwards
- abstract classes can participate too, following the same code shape
- identity classes (abstract and concrete) have a little more freedom, because they can follow the same pattern *or* set their fields after the super() call

I need to think more about this, but it seems to me at the moment that everything falls out cleanly...

>> (If it works, does this mean we get support for super fields "for free"?)
> 
> That is probably true.  Do we care?

I'd be happy to get rid of special rules that have to do with super fields. (Replacing it with a rule that says certain shapes of abstract class constructors imply identity.) Not so much because of particular use cases, but because it makes the language more regular.



More information about the valhalla-spec-observers mailing list