RFR: 8015831: Add lint check for calling overridable methods from a constructor [v5]

Wed Jan 11 22:43:16 UTC 2023

On Wed, 11 Jan 2023 21:51:45 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

> So, in this example though you are calling an instance method before the object is initialized, which would seem to me like a leak

D'oh, you're right. But if you made `returnMe()` static or private then the argument would still hold.

> And, if the method is static, same story - you are passing ref3 somewhere else, and ref3 potentially contains this.

Not true... static methods are safe to "invoke" because they can't be overridden. When the analyzer "invokes" the method, it will see that all it does with its parameter is return it.

In other words, any method (or constructor) in the current compilation unit that can't be overridden is never considered "somewhere else".

> While I know that this is not perfect, and bound to generate false positives, I get the spirit of my question is, really: is there a (much) simpler scheme we can get away with, which has bounded complexity, and which has the property we care about (which seems to be no false negative). I'm less worried about contrived cases emitting false positives, as it's an optional warning that can be shut down - but I'd like perhaps to move the discussion from trying to detect precisely if the leak happens to try to detect if potentially a leak can happen, and see if there's some simpler analysis that can be used to get there (e.g. one which doesn't require flooding). It's possible that you have already considered all these options and the analysis you have here is the best trade off between complexity and precision - but I'd like to have a better understanding of what the trade offs are, and, more importantly, what happens to real code when we tweak the analysis this or that way (as this is a
  problem where I feel it's easy to get in the land of diminishing returns).

I can only report from my own experience. I thought about this a bit and tried a few different things and what I ended up was the best I could come up with in terms of that trade-off. Of course there was a good bit of intuition and SWAG'ing in that and also a lot of thought experiments.

Of course if you have a clever idea for how to do this in a simpler way that achieves basically the same result, I'm all ears :)

But I don't really have an analysis of all the trade-offs. Really, at a certain point I stumbled on what I thought was a fairly straightforward way to achieve a good false negative rate and a very low false positive rate, without a lot of work. And frankly at that point I stopped caring about the question you're asking, although I agree it's certainly interesting from an academic point of view (I'm a practical person).

A lot of my initial ideas were too simple for my taste, because I could easily find real-world false negatives.

An example of "too simple": at first I was not trying to track an outer 'this'. But I found a lot of constructors out there that instantiate nested classes and then do things with them. Without tracking outer 'this' these would all be missed.

To take a random example of that:

public class FileChooserDemo extends JPanel implements ActionListener {
    ...
    public FileChooserDemo() {
        ...
        // create a radio listener to listen to option changes
        OptionListener optionListener = new OptionListener();

         // Create options
        openRadioButton = new JRadioButton("Open");
        openRadioButton.setSelected(true);
        openRadioButton.addActionListener(optionListener);  // <- LEAK HERE
        ...
    }

    private class OptionListener implements ActionListener {
        ...
    }
}

That particular leak is (probably) innocuous, but it still qualifies as a leak and should be reported. Note that failing to track an outer 'this' causes false negatives, which are a bigger problem than false positives.

On the flip side, I started out trying to explicitly track field references, but that seemed like more complexity than needed, at least for phase 1, so that was left out.

The "flooding" aspect didn't really worry me because the reference set is "append only" and there is small, finite set of possible references at each scope, so it can't really get that out of hand. Testing indeed shows it's not a problem. By the way recursion also "floods" until convergence, just like looping.

I would have been pleased to find "a much simpler scheme". But the more I thought about those options, the more I came up with easy misses. And after a certain point, it actually became easier to simply "execute" the code by scanning the AST while carrying around a `Set<Ref>` to track possible references. It's really that simple. Instead of trying to be "smart" we just let the code tell us what happens.

Apologies if this is not a very good answer to your question.

In the big picture, the false positive rate is traded-off against code complexity, and we want the best possible trade-off, right?

But how are you defining "complexity"?

If you really mean performance, then I don't see a problem... the JDK build times are essentially unchanged.

If you mean lines of code or whatever, then it doesn't seem inordinate. And the algorithm is more or less just an AST scan, like lots of other examples in the compiler code. So I don't see a problem there either.

Yes, there may be a much simpler way that's just as good, but if I could have thought of it I would have already. Perhaps you or someone else has better insight.

-------------

PR: https://git.openjdk.org/jdk/pull/11874