Field initialization before 'super'

Wed Dec 13 22:36:24 UTC 2023

I have been wanting this feature for literally decades!  I’m
glad to see that its time is coming.  My reasons:

  - users can prove their code is free of certain init-races and NPEs
  - specifically, safe publication is safer
  - JIT no longer needs to do escape analysis to trust finals
  - stuff we had to do for ICs and lambdas is now generally available
  - we get closer to fixing certain race conditions in deserialization
  - we get closer to dealing with tech. debt from setAccessible

(The cost of this feature is cultural.  Now users have two ways
to do one task, initialize a final field: The improved way, and
the usually-OK-but-sometimes-not way from literally Java 1.1.
Which I defined.  Which is why I’ve been wanting a fix for years.)

I think it all works as you have framed it, Dan, with phase 1 quietly
inferring strictness from constructor behavior.  The user must opt into
the changed field semantics by changing constructor code.  There is no
“nicer” way to opt into strictness, say on a whole-class basis, but
that is clearly something that can follow later.

The ACC_STRICT bit reflects the internal implementation change in the
order of construction.  Note: This change is not an API change, yet.
Remember that reflection can observe private details like ACC_BRIDGE.

(It may become an API change (with more explicit declaration) when
we start interacting with Valhalla value classes.  Spoiler:  A
value class, and all its supers, must have only strict final fields.)

Even if we didn’t have ACC_STRICT in the classfile, the JVM could
infer its setting by examining all relevant constructors.  But it
seems fair to put the bit into the classfile if the user goes to
the trouble if refactoring construction order.  (It cannot be
inferred from legacy code, since the putfields are in the wrong
places.)  And for Valhalla we really do need ACC_STRICT, along
with the related verifier rule to give it teeth.

> ACC_STRICT implies ACC_FINAL and !ACC_STATIC. Verification ensures 
> that a 'putfield' for an ACC_STRICT field of the current class never 
> occurs after the 'super()' call. (Specifically, the receiver type for 
> the putfield must be 'uninitializedThis', not a class type.)

Mentioning this entanglement between mode bits is a nerd-snipe.  Let’s
be careful here, for the sake of the future, that we don’t rule out
strict statics, and strict non-finals.  What I think we can and should
do is roll out strictness ONLY for final non-statics, but reserve
judgement about the other combinations.

The combination of guarantees from (a) the existing language rules
for finals and from (b) the new rules for strictness makes the following
statement true for strict finals:

>> NO-READ-BEFORE-INIT It is impossible for the JVM to issue a read of
>> a strict field until after the first write.

This is a broader statement than the fiddly details of strict finals.
But it is equivalent, given (a) and (b), for non-static finals.
(Prove me wrong!)

Curiously, this statement could, in the future, be a contract for other
kinds of fields besides non-static finals.  And that is how nerd-snipes
go viral.

(And to my fellow nerds:  PLEASE do not burden this JEP with all of the
possible implications of that formulation of strictness.  All in good 
time.)

> 3) Immutability of strict finals is a strong guarantee. JVM internals 
> may treat strict final fields as truly immutable, without supporting 
> any deopt paths when unexpected mutation occurs.
>
> The 'Field.setAccessible' method, which provides a standard API 
> mechanism for mutating final fields, considers strict finals to be 
> "non-modifiable", and will not enable reflective writes. (It already 
> does the same for record fields.)

There is some debt here to pay, because some framework authors will
ask what is the alternative to the setAccessible we have taken away.
But I think that can come later.

One way to soften the blow would be to let the framework authors down
more gently, and give some story for “well we allowed it up to now,
and we will throw warnings and then errors at you soon, and you should
learn the alternative tactics real soon”.  The Java module system 
played
these games, as does Panama in its native method access rules.

But saying “it is turned off” is a start.  The start of a 
conversation.

What happens if (somehow) a framework author starts monkeying with
final fields?  This breaks the unique initialization condition for
finals, since any final smashed by a framework presumably already
had a perfectly legitimate value assigned by its constructor.
(Or if it was Unsafe.allocateInstance, the legitimate value was
the default value.)  Really, such frameworks create potential
violations of the NO-READ-BEFORE-INIT rule I stated above.

By the way, the optimizing JIT secretly performs reads of fields,
when it inspects live data and decides to constant fold it,
or make other conclusions based on the live data.  So even if
the source code does not apparently commit read-before-init
faults, the JIT might.  It’s the JIT’s responsibility to
be careful with such reads.

One thing we might want to do is to add logic to setAccessible
that throws away code that might have been optimized too
confidently, since setAccessible(true) means there may be
read-before-init hazards coming, on a particular field.
This sort of thing has been prototyped in the past more
than once, I think.  It’s just hard to put the whole story
together without the early-init of fields given by this JEP.

But the easiest best starting point is to turn off setAccessible
as much as possible.

> Standard deserialization ensures strict finals are set, and so their 
> values deserialized, before the object under construction is leaked to 
> any user code. This probably means back references to an object from 
> its own strict final fields are unsupported, and deserialize to 
> 'null'. (Records already behave in this way.)

My nightmare about this has always been:  Can I prove that the
JIT will NEVER peek into an unfinished object, and draw conclusions
from the uninitialized field contents?  Such a proof seems tricky,
given that the deserialization code makes no clear distinction
(that the JIT can see) between under-construction objects and
safe-to-publish objects.  This is a long-standing technical debt
in deserialization (all such frameworks).

> Unsafe and JNI are capable of performing arbitrary, type-unsafe 
> modifications to field storage. Clients who modify strict finals do so 
> at their own risk, and JVM optimizations won't try to account for such 
> usage.

Yep!  If you use Unsafe or JNI to hack objects, you become a VM
implementor.  Don’t want that responsibility?  Use a higher level
API.  (Such as record canonical constructors or TBD for strict
fields or Valhalla values.)

> That covers "phase 1" for this feature. Eventually, we'll want to 
> address questions like
> - What about fields with initializers?

(Choices there include “move the field initializers always”,
“move initializers which are simple enough”, and “don’t use
field initializers”.  Dunno which combination wins.)

> - Can I have my implicit 'super()' call go at the end of my 
> constructor?

(Choices include, “sure, why not? we’ll pretend the super doesn’t 
do
side effects”, or “we can somehow tell the super is safe”, or 
“nope,
only for Valhalla value classes”, or “only if you opt in with a
class-level keyword”.  Again, who knows which one wins.)

> - Can javac check for me that my fields are strict?

(Choices include, “request checks with a class-level keyword”,
or “only for Valhalla values”, or “use an annotation”.  Today
I find it kind of charming that an annotation might do the
trick, at least before Valhalla.  Something like @Override
that merely observes and comments.  It might even comment
that you could put your super in a better place.)

— John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20231213/97b16a52/attachment-0001.htm>