Target of opportunity: remove method <vnew> and aconst_init / withfield opcodes

Tue Aug 15 03:05:28 UTC 2023

On 12 Aug 2023, at 13:39, Remi Forax wrote:

> John and I and several others had several interresting discussions yesterday about Valhalla,
> one of them about the removal of <vnew> and aconst_init/withfield, something i've called in the past, solving the last mile issue.

Yes, you’ve mentioned this, but I wasn’t convinced it was possible until Friday.

> We may want to modify the verifier to verify that "this" does not escape the constructor in case of a value class, but I do not thing it's a requirement, so it's maybe better to not do that :)

Actually, for me the whole proposal collapses unless the buffer containing the not-yet-constructed value can be rigorously prevented from escaping.  (I call it a buffer, because it not yet a value.)  The only valid operations on such a buffer are a. calling <init> (at the end of which it may be treated as a real value), and b. during the course of that <init> method calling putfield on it.

If the buffer is allowed to escape in any other way, then the transition between (larval) buffer and (adult, frozen) value becomes ill-defined.  In particular, if the escaped buffer is stored in a heap variable, then race conditions can affect the contents of the finished value, but there is no way to “find and fix” the escaped buffer and promote it to be a real value, unless you allow a racy side effect for the state change.  I think requiring such “flexibility” would impede optimiziation.

But there are four lucky breaks that I realized on Friday:  1. The uninitializedThis state already defined by the verifier is sufficient (as well as necessary) for preventing escape of (larval) buffer references.  2. The existing rules for using uninitializedThis, including the permission to use putfield, are sufficient (as well as necessary) for initializing value buffers, 3. the JMM freeze operation at the end of <init> can be interpreted (for values) as the operation which transitions a mutable (larval) value buffer into a true immutable (adult) value, and finally 4. a special rule that forbids a value constructor from calling a super-constructor (or this-constructor) is sufficient to protect the (larval) value buffer from escaping, all the way up to the freeze.

That last point 4 is the only adjustment necessary to the verifier rules, in order to make the whole scheme work.  I am against all but the most necessary verifier changes in general, and I like the fact that I get rid of two bytecodes (and their verifier rules) in exchange for a simple modification to the access rules for a constructor calling another constructor on its uninitalizedThis value.

> The end of constructor is already a point where all VMs/JITs are able to emit codes (for store/store barrier or finalizer registration), so we are piggy backing on an existing concept. The only main drawback I see is that the header of a buffered value type has to have one bit to indicate the larval state and those header bits are really precious.

Not a problem, I hope.  Basically, the states of a value-class heap node header are either larval or adult, while the states of an identity-class heap node header are either locked or unlocked.  Value types are not allowed to store complex object monitor states, so it is likely (even with Lilliput) that no new bits are needed.  Perhaps the larval value buffer or the adult value itself can be given special inflated states as needed; they would be singleton states, not heap-allocated.

Why explicitly mark the larval state at all?  Maybe it’s not necessary for bytecode-based value construction, if the reference is so rigidly constrained, but it might gate future some GC optimizations such a de-duplication (or NUMA splitting OTOH).  You can’t do that stuff to a larval value buffer.  Also, Unsafe can use it as an error check (though we often dispense with error checks for Unsafe).

Most importantly, though, there has to be some way to prevent putfield from being accidentally applied to an adult value.  The verifier could be asked to help with this, but I don’t like adding subtle new verifier rules.  It’s better just to have a dynamic check, that says “putfield looks for the larval buffer state and throws if it finds a properly frozen value”.  The dynamic check is relevant only in the interpreter; the JIT can usually elide the whole larval state completely.  One case where it cannot:  If the <init> method is compiled for an out of line call, from a caller who has allocated the larval buffer.

— John

P.S. It is an optional move, but I think we should discuss keeping <vnew>.  The <vnew> would be demoted from its current draft role as a constructor, to a simple static factory, which then does the new-dup-init dance.  The VM could supply it automagically, or it could be explicit bytecode declarations.  It could be extended to identity classes, so all kinds of “new” expressions could link through <vnew> as a static factory.

Armed with <vnew> as the canonical way to build values (outside of the value class itself), we could mandate that, after a certain classfile version, the “new” bytecode on a value class is a private operation (like we treat “withfield”).  And maybe the same for identity classes as well, so we retain migration compatibility (in both directions).  Old classfiles would be exempt, and would still contain new-dup-init, but when recompiled they would call <vnew>.

Why do this?  It would reduce proliferation of the new-dup-init dance, in favor of a simpler and safer vnew dance.  There have been many serious bugs in HotSpot from complicated interactions of the uninitializedThis produced by a “new” with other edge cases.  There would be many fewer occasions for bugs if only nestmates of a class were able to make actual raw new instances of that class.  Then all the parts of the verifier that deal with uninitializedThis would be confined to just a few places, in the class definition itself.  The author of that class would have more exact control over what was done with these raw new instances.  It would not be necessary to trust the verifier to protect partially constructed instances, which is good because it has not always done a perfect job of that.

I realize folks will say, “we just rehabilitated <init> for values, so why add a new <vnew> to wrap around it?”.  So, full disclosure, I wish we had already shifted (in newer classfiles) to a static factory like <vnew> instead of new-dup-init. That’s how it seems to me, as I look back on years of verifier bugs.