RFR: 8015831: Add lint check for calling overridable methods from a constructor [v5]

Thu Jan 12 10:00:21 UTC 2023

On Thu, 12 Jan 2023 02:14:10 GMT, Archie L. Cobbs <duke at openjdk.org> wrote:

>>> 
>>> D'oh, you're right. But if you made `returnMe()` static or private then the argument would still hold.
>>> 
>>> > And, if the method is static, same story - you are passing ref3 somewhere else, and ref3 potentially contains this.
>>> 
>>> Not true... static methods are safe to "invoke" because they can't be overridden. When the analyzer "invokes" the method, it will see that all it does with its parameter is return it.
>>> 
>> 
>> So, for static methods, it could go down two ways: either we don't even look at referenced method bodies, give up and just declare "sorry, escaping". Or, if we look into method bodies, and see that the relationship between inner and outer parameter is as simple, it's just like assignment again:
>> 
>> ref1 -> ref2 -> ref3 -> x -> ref4
>> 
>>> But I don't really have an analysis of all the trade-offs. Really, at a certain point I stumbled on what I thought was a fairly straightforward way to achieve a good false negative rate and a very low false positive rate, without a lot of work. And frankly at that point I stopped caring about the question you're asking, although I agree it's certainly interesting from an academic point of view (I'm a practical person).
>> 
>> The reason I'm asking these questions is that I'm trying to understand which ingredients went into the analysis and why, in order to try and build a mental problem of what the problem that needs to be solved is, what are the constraints, etc. I'm also a practical person :-) and I often find it easier to understand a piece of code if I know some of the reasoning that went behind it, or what's the behavior supposed to be.
>> 
>> I do not know, off-hands, whether there is a simpler solution - I was mostly probing for war stories of the kind "I tried X and I got Y", and I apologize if that came off the wrong way.
>
>> So, for static methods, it could go down two ways: either we don't even look at referenced method bodies, give up and just declare "sorry, escaping". Or, if we look into method bodies, and see that the relationship between inner and outer parameter is as simple, it's just like assignment again:
> 
> Right - and just to reconfirm, it's only the non-overridable methods in the same compilation unit that are "invoked" (i.e., analyzed & tracked). Any method that (might) exist in another compilation unit is off limits, so we have to assume the worst case and declare a leak if we are passing it any 'this' references.
> 
>>  in order to try and build a mental problem of what the problem that needs to be solved is
> 
> Gotcha.
> 
> Let's be clear about what exactly we're worrying about. It's this situation: You have a class `B extends A`, where `B` and `A` are in different compilation units, and during the `super()` call that constructor `B()` makes to its superclass constructor `A()`, some code in class `B` gets executed (really, it's some field in `B` getting accessed that actually matters) prior to `A()` returning.
> 
> So the basic goal is to analyze `A` constructors and watch 'this' references until if/when they go to someplace where we can no longer watch them. At that point we have to assume that some code in `B` might get invoked and so we generate a warning.
> 
> OK, so where can a 'this' reference go?
> 
> Well, here are all of the possible places a Java reference can exist:
> 1. On the Java stack
>     (a) The current 'this' instance
>     (b) A method parameter
>     (c) A local variable
>     (d) A temporary value that is part of the current expression being evaluated
>     (e) The return value from a method that just returned
>     (f) A caught exception
> 1. In a field of some object...
>     (a) A normal field
>     (b) An outer 'this' instance
>     (c) Other synthetic field (e.g., captured free variable)
> 1. As an element in a reference array
> 1. In native code as a native reference
> 
> Those are the only possibilities AFAIK.
> 
> So one way to locate yourself on the spectrum from "simple" to "complex" is to answer this question: Which of those are you going to try to keep track of, and for each one, how hard are you going to try?
> 
> The `this-escape` analysis being proposed tracks 1(a-e) and 2(b) pretty closely (it tries hard), and it adds a very course tracking of 2(a-c), and 3 using the notion of indirect references (it doesn't try very hard). We do not track 1(f) or 4.
> 
> So to think about the overall problem, imagine how you might or might not address all of those cases.

> * On the Java stack
>   (a) The current 'this' instance
>   (b) A method parameter
>   (c) A local variable
>   (d) A temporary value that is part of the current expression being evaluated
>   (e) The return value from a method that just returned
>   (f) A caught exception
> 
> * In a field of some object...
>   (a) A normal field
>   (b) An outer 'this' instance
>   (c) Other synthetic field (e.g., captured free variable)
> 
> * As an element in a reference array
> 
> * In native code as a native reference

Thanks for the classification. This is helpful.

I'm not sure what you mean by (1f). You mean `this` can be embedded in an exception being thrown? Is that different from (2)?

Also, it seems to me that (3) is a special case of (2) - in the sense that you can imagine a reference array as a big object that has many fields as there are elements - so the same reasoning applies.

When looking at (2d) and (2e) the fact that Javac's AST is not in SSA form means that, yes, we need to track _expressions_ not just variables (e.g. for chains of methods calls, ternary operators and such).

-------------

PR: https://git.openjdk.org/jdk/pull/11874