RFR: 8341782: Allow lambda capture of basic for() loop variables as with enhanced for() [v2]

Fri Oct 11 00:46:11 UTC 2024

On Thu, 10 Oct 2024 23:42:57 GMT, Archie Cobbs <acobbs at openjdk.org> wrote:

>>> Having special rules for loop variables is sending us down the slippery slope of finding the next stable point between (1) and (2) - assuming one exists. The risk here is that once we start relaxing things in some areas (e.g. loops) you can quickly run into issues where other deficiencies in capture become even more apparent.
>> 
>> I agree with all of that as well. There is one valid counter-argument though (reflected in the JEP draft) which is that we have already taken this current step with enhanced `for()` loops, so we're not really establishing any much of a new "frontier" with this feature by adding the same thing to basic `for()`. In other words, enhanced `for()` is an intermediate design point that's already been taken, for better or worse, so we might as well make the most of it. Or put more plaintively, "Please don't blame this feature for all the world's problems" :)
>> 
>> Regarding the bigger picture question, I'm skeptical that there exists any other design point between 1 and 2 that would be preferable to just jumping straight to 2, i.e., capture anything that's DA anywhere, but of course anything's possible.
>
> Thanks for the bug fix suggestion. Should be fixed in 00181f7f1b4.

> It's almost as if the "well-behaved" region "captures" the variables that it uses and makes them temporarily effectively final... which then begs the question, why not just cut out the middle man? That would then put us in a situation where any variable could be captured anywhere (which some have argued for), i.e., everything is effectively final at the point of capture.

I am a long-time requester of this feature, in JDK-8300691.  I also considered whether other loops have a generalization, but gave it up after some thought.  I realized that the distinctive feature of old-for loops is their ability to bind variables that are not visible outside of the for loop, and also in the distinction between the region of the for loop header and for loop body.  Due to the traditional way these loops are used, it is very common (though not 100% universal) that such variables may be mutated in the header but are not mutated in the body.  The presence of two syntactic regions, one where variation is OK and one where variation is absent.  In the latter region, there is no benefit to telling the user "do not capture this because it might be different later on" (the rationale for non-capture of non-finals) because the only "later on" is another execution of the same loop body, in a different trip of the loop.

All this is to suggest why I think this request (and mine) should be _specifically and exactly only for old-for loops_, because they key off of specific and unique syntactic features (variable declaration + two syntax regions).

Therefore, I am pleading that we not confuse ourselves by attempting a generalization here.  Discarding the specifics of old-for loops in quest of some greater generalization is (IMO) a doomed effort, and even if it ultimately pans out, it should not stop us from dealing with the specifics of old-for loops.  If we are smart enough to come up with such a generalization, we will be smart enough to fit it into the current proposal, which is limited to the hardwired specifics of old-for loops.

I am the originator of the rule that forbids capture of non-finals (in Java 1.1).  I can tell you why it got there (if not how people think about it today).

The decision was made (after prototyping) _not to capture_ a full mutable state.  (…Such as a constant pointer to a one-element mutable array.)  There are numerous non-obvious reasons why that is a doomed decision. It's doomed even if good languages have done so — and as a Common Lisp user I saw the very sharp edges from the interaction between mutable variable capture and non-sequential executions the observe the mutable variables.  (Non-sequential includes saving a callback that reads the loop variable, and forking a thread in the loop.)

So, if we were not going to save access to the future states of the mutable variable, we had to take a snapshot.  That's nice, but then another problem appears:  How do you explain to the user that if he changes the value of the variable later on, a lambda made before won't see the change, while a lambda made after will see it?  Such a user will rightly think the language is keeping double copies of the variable, which a bad way to implement a variable.  The way out of the dilemma, in 1.1, was to insist that only final variables can be captured (and they must be definitely assigned with their one and only value).  This way you can have multiple copies under the hood, and there's no state to be shared, and the user's code will see a consistent binding for that variable.

It was not clear we could do better than that, but Java 8 made a step forward with the concept of "effectively final" variables, which are variables that "might as well" have final declaration; such variables can be captured by lambdas as well.

What we don't cover, now, are codes like either of these two examples:

// Ex 1: multiple assignments before the variable stabilizes
int K = 1;
if (p())  K = 2;
//FIX:  effectively-final int K = p() ? 2 : 1;
return () -> K;  //BAD cannot be effectively final

Here, you might be able fix the problem by refactoring the final so that it is defined by one expression, or defined by disjoint (nonoverlapping) assignments from a blank state.  (Blank finals were introduced by us in 1.1 as well, to open up more natural usages of final locals and final instance variables.)

final int K;  // blank
if (p())  K = 2;
else  K = 1;
return () -> K;  //OK

Second example:

// Ex 2: re-assignment where the user expects two values from one var
int V = 1;
Globals.T = new Thread(() -> V);  //will not end well

//maybe: if (p()) V = 2;
V = 2;  // update V for next lambda
return () -> V;  //BAD cannot be effectively final

//FIX:  split V into V1 and V2

It is clear that the V has two disjoint episodes, which could be renamed V1 and V2.  Could there be a predictable, understandable rule which would perform this split automatically (when a lambda needs it)?  Maybe, but it probably becomes difficult to do this automatically in all cases; some code will allow V to split cleanly, and other code won't.  It's not clear there is a viable user model here, that makes the splitting predictable and intuitive.

In the end, I doubt that the above examples can be made fit for lambdas without manual intervention (magic "implicitly split" or "implicitly disjointly initialized" features look unlikely to me).  But it might be possible to give users an explicit way to "lock down" a variable which is not yet final, so that (hereafter) it becomes subsequently final.  If the "lock-down" is limited to an enclosing block you might be able to salvage both of the examples above.

The feature I'm talking about (and maybe some of you may remember me talking about this before) is a re-declaration of a previously declared variable as (now) being final.  Let's call it "subsequently-final":

// Ex 1: multiple assignments before the variable stabilizes
int K = 1;
if (p())  K = 2;
subsequently-final K;
// sugar for final var K$ = K; 
// and with remaining K occurrences in current scope rewritten to K$
return () -> K;  //OK, rewritten to () -> K$

// Ex 2: re-assignment where the user expects two values from one var
int V = 1;

{  // the fix requires this bracket (so it cannot be fully automatic)
  subsequently-final V;
  // sugar for final var V$1 = V; 
  Globals.T = new Thread(() -> V);  // () -> V$1 instead
  // at end of scope, V$1 goes away, but V remains
}

//maybe: if (p()) V = 2;
V = 2;  // update V for next lambda
subsequently-final V;
// sugar for final var V$2 = V; 

return () -> V;  //OK, () -> V$2

//FIXED:  split V into V$1 and V$2

Such a subsequently-final feature provides an accounting (retroactive I trust) of how the current proposal works:

for (int N = 0, M = A.length; N < M; N++) {
   subsequently-final N;
   // capture N$ at will
}

The proposal here might be thought of "effectively-subsequently-final for variables declared by old-for loops".  In conclusion, it's a very specific patch to a commonly used syntax, which is reasonable by itself, and is unlikely to interfere with future changes to the language.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21415#discussion_r1796275438