JDK-8300691 - final variables in for loop headers should accept updates

Tue Oct 22 01:12:59 UTC 2024

On 21 Oct 2024, at 10:00, Archie Cobbs wrote:

> On Mon, Oct 21, 2024 at 11:36 AM Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com> wrote:
>
>> What I'm trying to say is that if a developer is confused (by the 
>> i++) on
>> what "frozen" means for a loop variable, they won't find any clarity 
>> in our
>> explanation that "lexically scoping trumps everything else". E.g. 
>> whether a
>> developer reads the STEP part of a loop as part of the body (after 
>> the end
>> of the body) or not is, I believe, a very subjective thing - and one 
>> which
>> affects how any change we make in the area will be perceived (IMHO).
>>
> Yes, if a developer is mentally cutting and pasting the STEP onto the 
> tail
> of the BODY then you're right, that makes it appear as if the variable 
> is
> not really "frozen".
>
> But the original basic for() proposal has the analogous problem - 
> i.e., if
> a developer is mentally cutting and pasting the STEP onto the tail of 
> the
> BODY, then in the original proposal the variable no longer appears to 
> be
> "effectively final in the body of the loop".
>

I think there might be something worth doing with some concept possibly 
denotable as “effectively frozen”, so I read quickly through much of 
this thread  but failed to find the definition of “frozen” per se 
(that is, explicitly, not effectively, frozen).  Lots of references with 
“you-know-what-I-mean” quotes, but I can’t tell what “frozen” 
is supposed to mean here.

(I will thank anyone who points out the definition in this thread of 
“frozen”.  But I will plough forward with my own attempt…)

The discussions of breaking the foundations of the language, making the 
same variable sometimes final and sometimes mutable, as control flows 
through different parts of loop syntax, do not impress me.  Languages 
are usually grown by adding new constructs that desugar to known-good 
old constructs, not by destroying protective mechanisms that happen to 
get in the way.

As with final, there MIGHT be concept of “frozen” that is worth 
making “effectively” present, but first we need to take a moment to 
define the non-effectively, EXPLICIT version.  Even if we never ship it, 
which I think we
won’t.

Also, naming is hard.  This discussion of “frozen” is not the same 
as “freezing” in other related areas of Java (JMM, “frozen 
arrays” proposals).  The problem at hand can only be solved (IMO) if 
you incorporate some sort of shadowing of the affected variables, so 
that the “view” of the affected loop variable is selectively 
shadowed by a read-only duplicate of the loop variable, in the body of 
the for-loop.

Let’s say that some variables (TBD), in some circumstances (TBD), can 
be subject to a special form of shadowing (JLS 6.4.1) called “final 
shadowing”.  If it went into the JLS, it would probably be mentioned 
in 6.4.1 along with all the other instances of shadowing.

PSEUDO-SPEC:

>> [In some circumstances] a local variable x may be final-shadowed on 
>> some statement S in part of  its scope (JLS 6.3).  When this is done, 
>> the effect is as if another declaration of the same name x is 
>> introduced, as a final local variable of the same type as the 
>> original declaration, which is initialized to the current value (as 
>> of S) of the original variable x.  (Such a declaration could not be 
>> written explicitly, as it would is illegal to explicitly shadow a 
>> local by another local of the same name.)  A final-shadowing 
>> declaration may only be introduced implicitly.  In addition, it is 
>> only introduced for a statement S that performs no assignment to x.  
>> The scope of the shadowing final variable x is the statement S, only.

And also I think we should nail down the use of this tactic:

>> In addition final-shadowing is only done for a variable x declared by 
>> old-for loop, and for the body S of that same old-for loop.  It is 
>> done whenever the
>> body S contains a lambda that captures x.  It is done in no other
>> circumstances.

Thus, the implicit introduction of final-shadowing may only happen when 
it has no effect on the behavior of the affected variable.  
Final-shadowing is thus largely transparent to the user, having no 
effect on occurrences of x in S.  Occurrences of the name x before 
and/or after S are not shadowed.  Within S, all occurrences of the name 
x are scoped to the final-shadowing declaration.

The effect of final-shadowing a variable x in a statement S may be 
visualized as  if S is rewritten to the block statement “{final var 
x$1 = x; S$1;}” where x$1 is a new name not occurring in the program, 
and S$1 is simply S with all occurrences of x replaced by x$1.

Having written all of this down, I realize that there is only a small 
motivation to make any of it explicit, as a user-specifiable syntax.  
(But it could be, something like “final x;”)  If the only point of 
final-shadowing is to enhance capture, we can have the discussion about 
where to apply the technique, but we don’t need to bikeshed a syntax, 
unless the user model requires explicitness in some way.

The only place we urgently need such shadowing is to fix the problem 
with old-for loops and lambda-capture, so we can narrow the 
applicability to exactly variables x declared by a for-loop, and applied 
(when possible and useful) to the body of the for-loop.  We could keep 
in our back pocket an option to invent a general and explicit syntax in 
the future, if for some reason we wanted to let users make explicit 
declarations of final shadowing at random points in their code.

The only user-visible effect of final-shadowing is to enable 
lambda-capture
(for x, in the statement S where it is final-shadowed).

What’s so special about old-for loops?  If final-shadowing is useful 
for old-for loops, why not do it everywhere a capture is attempted?  
Maybe in the future, but there are a number of issues to iron out, once 
you generalize beyond old-for loops.

The special thing about loops (all kinds) is their bodies are executed N 
times, where N can be greater than one, for any single given execution 
of the containing statement or block.  Any variable declared outside of 
the loop is declared once but might be used N times during the N 
executions of the loop body.  This raises a special question of 
something like final-shadowing, for uses in the loop body.  After all, 
you can mark a variable final, but such a variable cannot be given 
different values during those N executions, unless
it is declared within the loop body itself.  And the loop header is just 
outside the loop body.

The extra-special thing about old-for loops is they can declare a 
variable x (or several) in their header.  Such a variable has, arguably, 
an exactly bipartite scope.  Typically (not always) the for-header has 
all the logic which steps the variable through a range of values, while 
the for-body operates (typically, not always) only on one value of that 
variable.

I think the consensus (almost unanimous) is that something like the 
final-shadowing trick on the loop body would be a fine move, though not 
earth-shaking.
It would make old-for and new-for be more like each other which is good.
It would leave while loops out of the party, but… it’s a different 
keyword, and also you can refactor a while to be an old-for.

What about while loops?  What if I declare my iteration variables before 
my while loop?  Why should I have to declare it in the old-for loop to 
get the extra help from final-shadowing?  Why not apply the trick to all 
loop bodies of all kinds?  That takes us to deeper waters.  Doing this 
would allow a lambda to capture a loop-varying value.  But some loops 
mutate the iteration variable in the loop, so what about the limitation 
that the value should not be modified by the loop body — should we 
relax that?

Along that slippery slope we quickly fall into the question of opening 
the floodgates, and doing final-shadowing on all lambdas everywhere.  
The rules for a stable spot (not just halfway sliding down the slope) 
are hard to foresee.  Do we allow any statement anywhere to be 
(potentially) final-shadowed, if the language wishes it, in order to do 
lambda-capture?  How close do we allow side effects to get to 
final-shadowed place?  The simplest endpoint is just to go with 
so-called value-sampling lambdas (C++ has them), which don’t even 
pretend
to capture the variable; they just internally do something like 
final-shadowing, capturing the present value of the up-level variable, 
giving it the same name, and allowing the lambda to work with that 
snapshot, even while the up-level variable continues to change.

Why did’t we do this?  There are number of reasons, but the core 
design goal for capture (of up-level variables in both lambdas and inner 
classes) was to eliminate the opportunity to observe a divergence in the 
behavior of the original up-level variable, as opposed to the name which 
represents it in the lambda.

This is obviously a good goal, since code is easier to read when the 
same name means the same thing everywhere, as opposed to “sometimes 
the live variable and sometimes a snapshot I already took”, which is 
what lambda snapshots give you.
Note that fixing old-for loop cannot add much confusion of this sort, 
since the variable in question already has two clearly distinct zones of 
use, even though it has but one scope.  Random variables outside of the 
loop, even if used in the loop, do not enjoy this clarity of usage.

We certainly didn’t want to “stretch” real JVM locals to somehow 
be simultaneously present in lambdas and their defining blocks.  (In the 
‘90s we prototyped sharing a reference to a mutable box but that was a 
disaster in several ways.)  Thus, the translation strategy needs to copy 
a value, for an efficient JVM implementation and race-free behavior.  
Still, we wanted to hide that copy, rather than make it be an artifact 
of the user experience.  Hence, Java 1.1 says “only capture finals” 
and Java 8 says “and also effective finals”.

(By the way, capturing a field does not copy.  It copies the enclosing 
“this” pointer, and so both the lambda and the original block 
“see” the same variable, with all of its future mutations.  The 
common thread here that, for both fields and final locals, a capture 
captures 100% of the future history of the variable, equally available 
to both parties.)

The present suggestion of final-shadowing is really a tool for 
moderating that policy, by nailing down a mutable variable in specific 
places to behave “final enough” to be captured.  And after going 
over it, I think those places are exactly old-for loops, and nowhere 
else, at least not yet.  Any other place beyond an old-for loop must 
either have an explicit marking (syntax TBD) or else be similarly as 
paradox-free as old-for loops.

Here’s a paradox to avoid:

```
int x = 1;
Supplier<Integer> last = null;
for (; x < 10; x++) {
    last = () -> x;
}
assert last.get() == x;  //FAILS
```

The full history of x conflicts with the capture of x, and specifically 
the final post-loop value of x is surprising to the user of the lambda.  
The old-for loop variables do not have this opportunity for conflict.

Here’s a more comprehensive example to examine, where the old-for loop 
makes “one of each kind” of variable:

```
for (int i = 0, j = LIMIT, k = 0; i < j; i++) {
    foo(() -> bar(i));   // i: LEGAL AFTER FINAL-SHADOW
    foo(() -> bar(j));   // j: EFF-FINAL
    //foo(() -> bar(k)); // k: ALWAYS ILLEGAL
    k++;
}
```

That might work, and might be enough to remove a serious rough edge from 
old-for loops.  Beyond that, I don’t see any immediate wins.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20241021/cf23e1ac/attachment-0001.htm>