RFR: 8336492: Regression in lambda serialization

Fri Jul 26 13:31:45 UTC 2024

This PR consolidates the code for dealing with local captures in both `Lower` and `LambdaToMethod`. It does so by adding a new tree scanner, namely `CaptureScanner`, which is then subclassed by `Lower.FreeVarCollector` and also by the new `LambdaToMethod.LambdaCaptureScanner`.

The main idea behind the new visitor is that we can compute the set of (most) captured locals w/o relying on complex state from `Lower`, and make the logic general enough to work on *any* tree. This can be done by keeping track of all local variable declarations occurring in a given subtree `T`. Then, if `T` contains a reference to a local variable that has not been seen, we can assume that this variable is a captured variable. This simple logic works rather well, and doesn't rely on checking e.g. that the accessed variable has the same owner of the method that owns a local class (which then caused other artifacts such as `Lower::ownerToCopyFreeVarsFrom`, now removed).

The bigger rewrite is the one in `LambdaToMethod`. That class featured a custom visitor called `LambdaAnalyzerPreprocessor` (now removed) which maintains a stack of frames encountered during a visit, and tries to establish which references to local variables needs to be captured by the lambda, as well as whether `this` needs to be referenced by the lambda. Thanks to the new `CaptureScanner` visitor, all this logic can be achieved in a very small subclass of `CaptureScanner`, namely `LambdaCaptureScanner`.

This removes the need to keep track of frames, and other ancillary data structures. `LambdaTranslationContext` is now significantly simpler, and its most important field is a `Map<VarSymbol, VarSymbol>` which maps logical lambda symbols to translated ones (in the generated lambda method). We no longer need to maintain different maps for different kinds of captured symbols.

The code for patching identifiers in a lambda expression also became a lot simpler: we just check if we have pending lambda translation context and, if so, we ask if the identifier symbol is contained in the translation map. Otherwise, we leave the identifier as is.

Fixing the issue referenced by this PR is now very simple: inside `LambdaCaptureScanner`, we just need to make sure that any identifier referring to a captured local field generated by `Lower` is re-captured, and the rest just works. To make the code more robust I've added a new variable flag to denote synthetic captured fields generated by `Lower`.

#### Compatibility

This PR changes the set of fields that are added to local/anonymous classes. Consider the following case:

class Outer {
    void m(int x) {
        class Local extends Serializable {
            void print() {
                System.out.println(x);
            }
        }
    }
}

This used to generate the following bytecode for `Local`:

class Outer$1Local {
  final int val$x;
    descriptor: I
  final Outer this$0;
    descriptor: LOuter;
  Outer$1Local();
    descriptor: (LOuter;I)V

  void print();
    descriptor: ()V
}

With this patch, we only generate this:

class Outer$1Local implements java.io.Serializable {
  final int val$x;
    descriptor: I
  Outer$1Local();
    descriptor: (LOuter;I)V

  void print();
    descriptor: ()V
}

That is, the field for `this$0` is omitted (as that's not used). This affects the serialized form of this local class, as there's now one less field in the serialized form. While this change is desirable (it brings behavior in sync with lambda expressions, and avoiding capture of enclosing instance might be beneficial for GC reachability), we need to make sure this is ok.

Also, it is possible that the _order_ in which local capture fields appear in classes/constructor parameters of local/anon classes changes slightly, due the fact we're using a different visitor.

Another, minor issue in this rewrite is that now generated lambda methods can have different names from before. In the old code, a shared counter for _all_ lambdas in a class was used. As the new code shares the same logic for checking name clashing as the one used in serializable lambdas, we now increment the suffix after `lambda$foo$` if needed. So if two lambdas are defined in two methods `foo` and `bar`, we now get `lambda$foo$0` and `lambda$foo$bar$0` (before, one of them would have had the index set to `1`). While this affects some tests, which name we generate for synthetic lambda methods is a private contract, so I didn't think that preserving 100% compatibility was relevant here.

@cushon perhaps you could help assessing the compatibility impact of this change (as you seem to have a codebase which rely on serialization of local classes/lambdas) ?

-------------

Commit messages:
 - Fix Scoped test
 - Initial push

Changes: https://git.openjdk.org/jdk/pull/20349/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20349&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8336492
  Stats: 1462 lines in 10 files changed: 395 ins; 716 del; 351 mod
  Patch: https://git.openjdk.org/jdk/pull/20349.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20349/head:pull/20349

PR: https://git.openjdk.org/jdk/pull/20349