known translation issues related to JEP 482

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Thu Jun 13 10:50:22 UTC 2024


On 12/06/2024 22:06, Archie Cobbs wrote:
> Hi Maurizio,
>
> Thanks for looking into these issues. I agree this is definitely one 
> of those "we need to rethink and refactor" opportunities. Please let 
> me know if/how I can help. So far I've done some complaining but not 
> much else :)
>
> I'm trying to wrap my head around what's needed and have tried to 
> write this out in detail mainly for my own understanding. Please tell 
> me if the below sounds correct to you.
>
> Observation: this issue doesn't really affect non-static member inner 
> classes: they can never be declared in an early construction context 
> of their directly enclosing class. So they always have an immediate 
> outer instance, and it's always available and usable (or explicitly 
> provided via qualified new). So all the complexity comes with local 
> and anonymous classes, especially when they are declared inside early 
> construction contexts with various levels of nesting. Fortunately, the 
> JEP treats local and anonymous classes basically the same way.
Correct. Btw, I think I like the term "Member inner classes" to mean 
"non-static". We typically used "nested class" to mean the static kind. 
In any case, these classes are *not* problematic. Local and anonymous 
(where the latter is just a special case of the former) are. And the 
first big issue implementation-wise is that javac is attempting to 
translate _all_ kinds of inner classes (whether member, or local) in the 
same way, which leads to issue.
>
> Some terminology (just placeholders for now, I'm mainly trying to nail 
> down the logic):
>
>  1. Every class declaration C has zero or more lexically enclosing
>     class declarations; let's call them E₁, E₂, E₃... ordered from the
>     inside out.
>  2. For ease of terminology, define the degenerate case E₀ = C
>  3. For each Eᵢ, there is possibly an *outer instance* Oᵢ *defined*
>     for C (see below)
>  4. For ease of terminology, define the degenerate case O₀ = C.this.
>  5. When is Oᵢ *defined* for C?
>      1. O₀ is always defined
>      2. Let k ≥ 0 be the smallest value such that Eₖ is static (enum,
>         record, declared in a static context, etc.)
>      3. For all i ≤ k, the outer instance Oᵢ is defined for C and it
>         has type Eₖ
>      4. For all i > k there is no outer instance Oᵢ defined for C
>  6. If Oᵢ is defined, it may also be the case that Oᵢ is *accessible*
>     in C (in non-static contexts):
>      1. Oᵢ is *accessible* in C if and only if C does not appear in an
>         early construction context of Eᵢ
>
Correct. The key observation here is that the set of lexically enclosing 
classes does not map 1:1 with the set of outer instances.
> From the above, you can see that the outer classes for which a 
> corresponding outer instance is defined includes C and is a contiguous 
> range up until you hit the first static class/context. However, the 
> set of outer classes for which the corresponding outer instance is 
> both defined /and accessible/ is an arbitrary subset of that 
> contiguous range. In effect we create an "inaccessibility hole" at any 
> class Eₖ for which C is inside an early construction context of Eₖ.
Yep, we sometimes refer to issues like these as "swiss cheese" problem :-)
>
> Observation: Suppose Oₖ of type Eₖ is defined for C. Then for any h > 
> k, Oₕ is defined (accessible) for Eₖ if and only if it is defined 
> (accessible) for C. In other words, if a class F encloses both C and 
> Eₖ, then C and Eₖ agree on whether F has a defined and/or accessible 
> outer instance. This is simply because C and Eₖ are lexically in the 
> "same place" with respect to F.
Yes.
>
> Define the *compiler outer instance* for C to be that Oₖ for which k > 
> 0, Oₖ is both defined and accessible, and where k is minimal (if such 
> a thing exists).
Yes, that is a possible strategy to tackle that. There is also another 
(see below).
>
> OK now given the above what does the compiler need to do?
>
> Part 0: The compiler needs to be able to calculate, given any class C, 
> whether C has a compiler outer instance (let's call it Oₖ) and its 
> type (let's call it Eₖ)
Yup
>
> Part 1: When compiling some class C for which a compiler outer 
> instance exists:
>
>  1. Calculate the type Eₖ
>  2. Add a synthetic constructor parameter Oₖ of type Eₖ to each
>     constructor of C
>  3. Add a synthetic field of type Eₖ in which to store Oₖ ("this$n")
>  4. Store Oₖ in that field in each constructor of C (very first thing)
>
Yes. Step (3) is done "on-demand", but essence is the same.
> Part 2: Compiling the expression "Foo.this" in the context of some 
> class C:
>
>  1. If C has no compiler outer instance, then error unless Foo = C.
>  2. Otherwise let Oₖ with type Eₖ be the compiler outer instance for C
>  3. For expressions like "Eₕ.this" where 0 < h < k: generate an error
>     ("no instance in scope")
>  4. For expressions like "Eₖ.this": evaluate to Oₖ as follows:
>      1. If expression occurs in an early construction context of C,
>         then read the synthetic constructor parameter of type Eₖ (we
>         can do this because we are necessarily in a C constructor)
>      2. Otherwise. read the synthetic field "this$0"
>  5. For expressions like "Eₕ.this" where h > k as follows:
>      1. Evaluate "Eₖ.this" per previous steps to get Oₖ
>      2. Recurse, i.e., evaluate "Eₕ.this" in the context of class Eₖ.
>         This is valid due to the earlier observation.
>
Yes - note the slight complication in step 4 due to the fact that 
accessing the field might not always be possible.
> Part 3: When compiling "new C()" (or "C::new") where C is some class 
> having a compiler outer instance:
>
>  1. Calculate the type Eₖ - the type of the synthetic parameter Oₖ we
>     need to prepend to the C() constructor invocation
>  2. Let's assume "new C()" is inside some method, constructor, or
>     initializer of some class T
>  3. Evaluate "Eₖ.this" in the context of class T (see Part 2)
>  4. Proceed with construction, prepending the synthetic Oₖ parameter
>
> Hopefully this is close to capturing how it "should" work...?

Yes. This is one way to do it. Another way to do it would be to say that 
local/anonymous class do not get a blessed "compiler outer instance", 
period. Instead, they just capture all the enclosing "this" they need in 
order to get the job done, without the expectation that one enclosing 
this will be reachable from the other.

This is based on the observation that there's really two ways to get to 
the same result:

* Lower has this mechanism, called "outerThisStack" which more or less 
tracks what you have called "compiler outer instance". E.g. at given 
point in the code, what enclosing instances are available to me?
* But there's also another mechanism - that for captured values (Lower 
calls this "proxies"). This is yet another set of values that are 
accessible to the local/anon class via a field (or a constructor 
parameter, if inside a constructor)

The approach you described tries to tweak "outerThisStack" so that the 
stack contains the right bits (and skips over inaccessible enclosing 
instances). The alternate approach is to just ditch "outerThisStack" for 
local/anon classes, and switch to a more capture-oriented translation. 
Note that lambda translation (LambdaToMethod) is more in the latter camp 
- e.g. if a lambda occurs in a pre-construction context, and needs 
enclosing instances O1, O2, O3, then it will accept _all_ of them as 
captured parameters.


There's also a second, more subtle, implementation issue, which I'm in 
the process of evaluating as we speak. It seems to me that the ordering 
of the compiler steps Lower and LambdaToMethod is backwards. E.g. 
LambdaToMethod goes first, then we translate inner classes with Lower. 
But this leads to issues: the lambda translation code doesn't see the 
full picture as local classes have not been translated yet. So it 
basically has to "guess" which variables will need to be captured for 
local classes inside the lambda to work. This is not great, and leads to 
bugs that are very difficult to solve, such as this:

https://bugs.openjdk.org/browse/JDK-8334037

I'm currently doing some experiments to see if we can move the compiler 
phases the "right way" (but this is a big change, and not something we 
can do for 23). My working theory is that if we (a) fix translation of 
local classes as described above (using either approaches) AND (b) move 
LambdaToMethod _after_ Lower, then most of the pesky translation issues 
with pre-construction context should disappear.

Maurizio


>
> -Archie
>
> On Wed, Jun 12, 2024 at 7:07 AM Maurizio Cimadamore 
> <maurizio.cimadamore at oracle.com> wrote:
>
>     Hi,
>     as we discussed through the details of the new JLS rules in JEP
>     482, we
>     have found several issues where javac is emitting incorrect code
>     (e.g.
>     attempting to capture enclosing instances that are still under
>     construction).
>
>     In order to allow us to look at these issues more holistically, I've
>     created a new JBS label, namely "javac-pre-capture". Here's a
>     query to
>     show them all:
>
>     https://bugs.openjdk.org/issues/?jql=labels%20%3D%20javac-pre-capture
>
>     As stated above, this label is meant for issue related to issues that
>     have to do broken translation strategy of enclosing instance
>     references.
>     Such issues can manifest them in different ways:
>
>     * compiler crashes when compiling correct code (e.g.
>     https://bugs.openjdk.org/browse/JDK-8334037)
>     * "late" compiler error  (e.g. in Lower) as the compiler can't
>     resolve
>     an enclosing instance (e.g.
>     https://bugs.openjdk.org/browse/JDK-8334121)
>     * compiler succeds when compiling a correct program, but the compiled
>     program doesn't verify (we don't have an instance of this, yet,
>     but it's
>     something bound to occur)
>
>     The scope of this label is somewhat (deliberately) narrow. As
>     such, the
>     label does NOT cover other issues pertaining to JEP 482, such as
>
>     * compiler attempting to translate a bad program
>     (https://bugs.openjdk.org/browse/JDK-8334043)
>     * compiler generating invalid code for reasons that have nothing
>     to do
>     with capture (https://bugs.openjdk.org/browse/JDK-8332106)
>
>     (if we think that a more general label would be useful to mark all
>     issues that have to do with pre-construction context, we can do
>     that too).
>
>     Cheers
>     Maurizio
>
>
>
> -- 
> Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20240613/84c6e1dd/attachment-0001.htm>


More information about the amber-dev mailing list