The last miles
forax at univ-mlv.fr
forax at univ-mlv.fr
Thu Jul 13 22:05:09 UTC 2023
----- Original Message -----
> From: "John Rose" <john.r.rose at oracle.com>
> To: "Brian Goetz" <brian.goetz at oracle.com>
> Cc: "Remi Forax" <forax at univ-mlv.fr>, "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Thursday, July 13, 2023 10:52:38 PM
> Subject: Re: The last miles
> On 13 Jul 2023, at 7:24, Brian Goetz wrote:
>
>> This is a good thought; we split the initialization protocol and its a fair
>> question to ask whether we can go back to a lump.
>>
>> In this case, I suspect John is about to say “Please let’s not give the verifier
>> any more jobs to do.”
>
> It is that, and even worse. If you work the details, you’ll quickly run into
> the fact that the <init> protocol (for Java constructors) builds an object but
> does not return the new object, it takes the new object from the caller in a
> tabula rasa (blank) state, and pokes values into it. Worse, the new object is
> supplied (by a new opcode) from an untrusted (even hostile) client. That means
> that the verifier needs complex rules (>10% of the total complexity) to track
> these untrusted-but-trusted blank objects and make sure they are handed to
> <init> before being used. That’s bad. We have a steady bug stream from this
> very delicate machinery. Maybe it’s done after a quarter century but I
> wouldn’t bet the farm on that.
>
> Worse still, for values, there is no architecturally defined state, for values,
> which corresponds to the “tabula rasa” state of the receiver of an <init> call.
> We know something of that state; it is called a “larval object”, but the
> Valhalla JVMS does not define or rely on it. The proposed “unification” would
> require us to somehow simulate larval objects in terms of today’s blank
> identity objects, and define how the larval-to-adult state transition works, or
> it would have to build new verifier rules for larval objects (mutable while
> <init> runs, then pure values after that). Either option seems much worse than
> what we have chosen to do so far.
>
> What we have chosen to do so far is have a functionally clean model for value
> objects that does not require mutability, either temporary (larval-only) or
> permanent (I shudder at that thought). This functionally clean model uses
> withfield instead of getfield, and aconst_init instead of the “new” opcode. I
> think that is a great trade, because it lets us off the hook from defining
> mutability into values, at any stage of their lifetimes.
>
> Yes, serialization smuggles larval mutability back in, but that’s a private
> matter of optimization, between the VM and JDK. I really don’t want to see
> that in the JVMS, because it would be just as hairy and complex and bug-prone
> as today’s new/<init> dance. Yes, we should use the old mechanisms when we
> can, and we do! But the new/<init> dance is, IMO, hopelessly entangled with a
> presupposition of object identity, and also hopelessly buggy; so I don’t think
> it can help us, and I wouldn’t touch to extend it even if I thought it might
> help.
>
> How’s that? :-)
Here, your analysis is based on the fact that neither the callsite nor the declaration site of <init> will change.
We are less contrained than that, the callsite can not be changed but the declaration of <init> can change,
recompiling the value class is something users will have to do anyway.
So the new + dup + invokespecial <init> dance has to be the same but not the semantics of each individual opcodes which can be adjusted to value class (it's a lump move) and the content of <init> and even its decriptor can be different.
Here is what I propose,
- inside the value class,
<init> should return the instance, so the decriptor should be <init>(LComplex;)LComplex; instead of <init>()V for a constructor with no parameters.
So inside the constructor, either "this" or the first parameter is ignored and withfield is used instead of putfield,
the fully initialized instance is returned by <init>.
The verifier is updated to understand the opcode "withfield".
- outside the value class, the semantics of the opcode "new" is changed to be the semantics of "aconst_init" if the class is a value class.
invokespecial Complex <init>()V semantics is changed if Complex is a value class, so on stack takes two instances + the parameters and calls <init>(LComplex;)LComplex;
It's not beautiful, it's a hack, as Brian said it's a lump move. But it's not as bad as you seem to think :)
Rémi
More information about the valhalla-spec-experts
mailing list