Loosening requirements for super() invocation

Fri Nov 4 16:27:08 UTC 2022

I'm not sure "final fields" is the "most basic", but its surely a 
reasonable problem to work on.

A key aspect of this is identifying what code might be run during 
specific windows of time.  For the window between "super call" and "last 
final field assigned", we are worried about final field reads; for the 
window covering the entire ctor, we are worried about broader uses of 
`this`.

We can then analyze such code (or if it is potentially in another 
compilation unit, issue a warning.)

To your point about "the superclass might downcast", this is something 
you can check for as something questionable to do with `this`.

On 11/3/2022 6:50 PM, Archie Cobbs wrote:
> On Thu, Nov 3, 2022 at 8:07 AM Brian Goetz <brian.goetz at oracle.com> wrote:
>
>     While we could do more warnings as ordinary RFEs, there's
>     potentially still value in grouping these all under a JEP, for the
>     reasons you hint at and more.  Let's work through the design,
>     figure out if we can actually provide any new safety guarantees
>     (as opposed to just more warnings), and then figure out the best
>     vehicle.
>
>
> Sounds good.
>
> I'll try to throw out a starting point...
>
> Meta-comment: it's probably prudent to start with something relatively 
> narrow and well-defined. Once that idea is fully understood we can 
> expand from there.
>
> OK let's define the problem. IMHO the most basic problem relates to 
> final fields. Obviously, Java is advertising them as special by 
> guaranteeing immutability and safe publication after construction. Yet 
> it's still easy for a final field to be read incorrectly before it's 
> been properly initialized. The FilteredSet example demonstrates this, 
> and also shows how non-obvious it can be. So having compiler 
> automation that would help you avoid this problem makes sense.
>
> Ideally we would like to have some warning X for which we can say "If 
> a class A has a constructor that (a) compiles without generating any X 
> warnings, and (b) has no @SuppressWarnings("X") annotation, then the 
> compiler guarantees that no final field in class A can ever be read 
> before it's initialized".
>
> Note the narrowness of this: we are only talking about final fields in 
> class A. Not fields in A's superclass or subclasses. We're not talking 
> about "object initialization", we're just talking about individual 
> fields. From a developer point of view, this is a feature: you can 
> assess (and address) the situation one class and one field at a time.
>
> Even so, from a practical point of view, solving this problem is 
> likely to cover the vast majority of the real-world cases where "this 
> escape" causes trouble, at least in my experience. The "this escape" 
> problems I've seen invariably can be solved by moving some final field 
> initialization to before the super() call (which will now be 
> possible), as in the FilteredSet example.
>
> There is another subtlety behind this idea: the warning needs to be 
> emitted at the point of offense, so we need to define where that is.
>
> Consider the FilteredSet example: which of FilteredSet, HashSet, or 
> AbstractCollection is "at fault"??
>
> Is the problem the fault of the HashSet() constructor, for invoking 
> addAll()? That's a superclass method, not a subclass method, so that 
> seems reasonable. Is it the fault of AbstractCollection.addAll(), 
> which calls add(), which is very likely to be overridden in a 
> subclass? That seems perfectly normal - after all addAll() is a 
> regular method, not a constructor. Or is it the fault of FilteredSet 
> for not initializing the filter field prior to invoking super()?
>
> My thought on this is sort of akin to the old saying that "good fences 
> make for good neighbors". In other words, don't rely on others to not 
> overstep when you have a simple way to prevent that yourself. Now that 
> we're giving classes the ability to protect themselves by doing final 
> field initializations prior to super() calls, let's put the burden of 
> protection on them, not on some unrelated superclass. In other words, 
> FilteredSet is "to blame" here.
>
> OK so now let's analyze when a final field myFinalField in class 
> MyClass could possibly be read prior to initialization...
>
> Consider any path of execution through a constructor. It will invoke 
> either this() or super() exactly once. If it invokes super(), it will 
> also initialize myFinalField at some point, either explicitly, or 
> implicitly - immediately after the super() call if the field has an 
> initializer or is assigned in an initialization block.
>
> Observation #1: Prior to invoking this() or super(), it's impossible 
> for any field to be read (JVM guarantee), so clearly it's impossible 
> for myFinalField to be read prior to initialization. No need to check 
> here.
>
> Observation #2: If the constructor invokes this(), we assume by 
> induction that the other constructor has already been checked and any 
> warnings generated. Upon its return, myFinalField must be initialized 
> already so we're done here too.
>
> So we only need to be concerned with the time during and after a 
> super() call.
>
> If myFinalField is explicitly assigned prior to invoking super(), then 
> we're done - it's impossible for there to be any reference prior to 
> initialization by Observation #1.
>
> Therefore, we only need to be concerned with the case where 
> myFinalField is initialized after the super() call.
>
> Let's call the time from the super() invocation to the point where 
> myFinalField gets initialized the "vulnerability window for 
> myFinalField" and any access to myFinalField within this window an 
> "early access".
>
> We want the compiler to generate a warning unless it can prove to 
> itself that there are no early accesses to myFinalField.
>
> There are two cases: early accesses during the super() call, and early 
> accesses after the super() call.
>
> Case #1 - Early accesses during the super() call
>
> We can make a special case here for classes that extend 
> java.lang.Object, where we can assume the superclass constructor does 
> nothing. Done.
>
> Otherwise, we observe the following: a superclass constructor could 
> downcast itself and, if myFinalFieldis public (or if in the same 
> package, any non-private), access it directly. It could also access 
> any public (or if in the same package, any non-private) method of 
> MyClass without necessarily having to downcast - and one of those 
> methods is very likely to be able to access myFinalFieldbecause that's 
> very common.
>
> So it seems pretty hopeless trying to prove there are no possible 
> early accesses to myFinalField when the superclass is not Object and 
> we can't inspect its constructor. It might be possible if 
> myFinalFieldwere private, and none of MyClass's non-private methods 
> accessed it, even indirectly. Even then it would take work to prove, 
> and this would only help in a few uncommon cases.
>
> Hmm... moving on...
>
> Case #2 - Early accesses after the super() call
>
> The only way myFinalField could be accessed is if the constructor 
> itself does so directly, passes off this to somewhere we can't follow 
> it, or if it (perhaps indirectly) invokes a non-static method of 
> MyClass (or a subclass of MyClass if myFinalFieldis not private) which 
> contains an early access.
>
> Here there may be a little more hope.
>
> Let's say an AST subtree is "nested" if it appears inside a lambda or 
> inner class.
>
> Let's define these fairly broad/conservative categories of AST 
> subtress that could possibly lead to an early access of myFinalField:
>
>  1. Non-nested expressions of the form myFinalField,
>     this.myFinalField, or MyClass.this.myFinalField
>  2. Non-nested non-static method invocations explicitly targeting
>     this, MyClass.this, super or SomeType.super, or implicitly
>     targeting this instance, e.g., hashCode()
>      1. Except for private or final methods declared in MyClass that
>         themselves have no early access
>  3. Non-nested expressions this or MyClass.thisthat are not part of a
>     field or method access (e.g., method parameters)
>  4. Non-nested instantiations of any inner class for which MyClass is
>     the outer class
>  5. Non-nested lambda expressions
>
> Notes:
>
>   * We can define these modulo extraneous parentheses, so for example
>     ((this)).method()would be in category #2
>   * The "Except for" methods in #2 are defined recursively, but they
>     should still be well-defined.
>   * In constructors, category #1 is already taken care of for us -
>     these expressions already generate a compiler error (once pr#10956
>     <https://urldefense.com/v3/__https://github.com/openjdk/jdk/pull/10956__;!!ACWV5N9M2RV99hQ!IKqkDsZzCLtmAkDK3gx8yuZQ4Yp0bknNR9rDIONIH3g7nx38l1jXirjiE0lP4o3WruZ2hHKKRpX2MVJViUQKyiQ$>
>     is accepted :)
>       o But not in regular methods, so we'd need checks in that case
>
> I *think* every early access after the super() call would have to 
> result from an AST matching one of the above categories.
>
> I'll stop here for now... thoughts?
>
> -Archie
>
> -- 
> Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20221104/76d035f5/attachment-0001.htm>