The last miles

forax at univ-mlv.fr forax at univ-mlv.fr
Thu Jul 13 22:05:09 UTC 2023


----- Original Message -----
> From: "John Rose" <john.r.rose at oracle.com>
> To: "Brian Goetz" <brian.goetz at oracle.com>
> Cc: "Remi Forax" <forax at univ-mlv.fr>, "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Thursday, July 13, 2023 10:52:38 PM
> Subject: Re: The last miles

> On 13 Jul 2023, at 7:24, Brian Goetz wrote:
> 
>> This is a good thought; we split the initialization protocol and its a fair
>> question to ask whether we can go back to a lump.
>>
>> In this case, I suspect John is about to say “Please let’s not give the verifier
>> any more jobs to do.”
> 
> It is that, and even worse.  If you work the details, you’ll quickly run into
> the fact that the <init> protocol (for Java constructors) builds an object but
> does not return the new object, it takes the new object from the caller in a
> tabula rasa (blank) state, and pokes values into it.  Worse, the new object is
> supplied (by a new opcode) from an untrusted (even hostile) client.  That means
> that the verifier needs complex rules (>10% of the total complexity) to track
> these untrusted-but-trusted blank objects and make sure they are handed to
> <init> before being used.  That’s bad.  We have a steady bug stream from this
> very delicate machinery.  Maybe it’s done after a quarter century but I
> wouldn’t bet the farm on that.
> 
> Worse still, for values, there is no architecturally defined state, for values,
> which corresponds to the “tabula rasa” state of the receiver of an <init> call.
> We know something of that state; it is called a “larval object”, but the
> Valhalla JVMS does not define or rely on it.  The proposed “unification” would
> require us to somehow simulate larval objects in terms of today’s blank
> identity objects, and define how the larval-to-adult state transition works, or
> it would have to build new verifier rules for larval objects (mutable while
> <init> runs, then pure values after that).  Either option seems much worse than
> what we have chosen to do so far.
> 
> What we have chosen to do so far is have a functionally clean model for value
> objects that does not require mutability, either temporary (larval-only) or
> permanent (I shudder at that thought). This functionally clean model uses
> withfield instead of getfield, and aconst_init instead of the “new” opcode.  I
> think that is a great trade, because it lets us off the hook from defining
> mutability into values, at any stage of their lifetimes.
> 
> Yes, serialization smuggles larval mutability back in, but that’s a private
> matter of optimization, between the VM and JDK.  I really don’t want to see
> that in the JVMS, because it would be just as hairy and complex and bug-prone
> as today’s new/<init> dance.  Yes, we should use the old mechanisms when we
> can, and we do!  But the new/<init> dance is, IMO, hopelessly entangled with a
> presupposition of object identity, and also hopelessly buggy; so I don’t think
> it can help us, and I wouldn’t touch to extend it even if I thought it might
> help.
> 
> How’s that?  :-)

Here, your analysis is based on the fact that neither the callsite nor the declaration site of <init> will change.
We are less contrained than that, the callsite can not be changed but the declaration of <init> can change,
recompiling the value class is something users will have to do anyway.

So the new + dup + invokespecial <init> dance has to be the same but not the semantics of each individual opcodes which can be adjusted to value class (it's a lump move) and the content of <init> and even its decriptor can be different.

Here is what I propose,
- inside the value class,
  <init> should return the instance, so the decriptor should be <init>(LComplex;)LComplex; instead of <init>()V for a constructor with no parameters.
  So inside the constructor, either "this" or the first parameter is ignored and withfield is used instead of putfield,
  the fully initialized instance is returned by <init>.
  The verifier is updated to understand the opcode "withfield".

- outside the value class, the semantics of the opcode "new" is changed to be the semantics of "aconst_init" if the class is a value class.
  invokespecial Complex <init>()V semantics is changed if Complex is a value class, so on stack takes two instances + the parameters and calls <init>(LComplex;)LComplex;

It's not beautiful, it's a hack, as Brian said it's a lump move. But it's not as bad as you seem to think :)


Rémi


   








More information about the valhalla-spec-experts mailing list