JDK-8300691 - final variables in for loop headers should accept updates
John Rose
john.r.rose at oracle.com
Tue Oct 22 01:12:59 UTC 2024
On 21 Oct 2024, at 10:00, Archie Cobbs wrote:
> On Mon, Oct 21, 2024 at 11:36 AM Maurizio Cimadamore <
> maurizio.cimadamore at oracle.com> wrote:
>
>> What I'm trying to say is that if a developer is confused (by the
>> i++) on
>> what "frozen" means for a loop variable, they won't find any clarity
>> in our
>> explanation that "lexically scoping trumps everything else". E.g.
>> whether a
>> developer reads the STEP part of a loop as part of the body (after
>> the end
>> of the body) or not is, I believe, a very subjective thing - and one
>> which
>> affects how any change we make in the area will be perceived (IMHO).
>>
> Yes, if a developer is mentally cutting and pasting the STEP onto the
> tail
> of the BODY then you're right, that makes it appear as if the variable
> is
> not really "frozen".
>
> But the original basic for() proposal has the analogous problem -
> i.e., if
> a developer is mentally cutting and pasting the STEP onto the tail of
> the
> BODY, then in the original proposal the variable no longer appears to
> be
> "effectively final in the body of the loop".
>
I think there might be something worth doing with some concept possibly
denotable as “effectively frozen”, so I read quickly through much of
this thread but failed to find the definition of “frozen” per se
(that is, explicitly, not effectively, frozen). Lots of references with
“you-know-what-I-mean” quotes, but I can’t tell what “frozen”
is supposed to mean here.
(I will thank anyone who points out the definition in this thread of
“frozen”. But I will plough forward with my own attempt…)
The discussions of breaking the foundations of the language, making the
same variable sometimes final and sometimes mutable, as control flows
through different parts of loop syntax, do not impress me. Languages
are usually grown by adding new constructs that desugar to known-good
old constructs, not by destroying protective mechanisms that happen to
get in the way.
As with final, there MIGHT be concept of “frozen” that is worth
making “effectively” present, but first we need to take a moment to
define the non-effectively, EXPLICIT version. Even if we never ship it,
which I think we
won’t.
Also, naming is hard. This discussion of “frozen” is not the same
as “freezing” in other related areas of Java (JMM, “frozen
arrays” proposals). The problem at hand can only be solved (IMO) if
you incorporate some sort of shadowing of the affected variables, so
that the “view” of the affected loop variable is selectively
shadowed by a read-only duplicate of the loop variable, in the body of
the for-loop.
Let’s say that some variables (TBD), in some circumstances (TBD), can
be subject to a special form of shadowing (JLS 6.4.1) called “final
shadowing”. If it went into the JLS, it would probably be mentioned
in 6.4.1 along with all the other instances of shadowing.
PSEUDO-SPEC:
>> [In some circumstances] a local variable x may be final-shadowed on
>> some statement S in part of its scope (JLS 6.3). When this is done,
>> the effect is as if another declaration of the same name x is
>> introduced, as a final local variable of the same type as the
>> original declaration, which is initialized to the current value (as
>> of S) of the original variable x. (Such a declaration could not be
>> written explicitly, as it would is illegal to explicitly shadow a
>> local by another local of the same name.) A final-shadowing
>> declaration may only be introduced implicitly. In addition, it is
>> only introduced for a statement S that performs no assignment to x.
>> The scope of the shadowing final variable x is the statement S, only.
And also I think we should nail down the use of this tactic:
>> In addition final-shadowing is only done for a variable x declared by
>> old-for loop, and for the body S of that same old-for loop. It is
>> done whenever the
>> body S contains a lambda that captures x. It is done in no other
>> circumstances.
Thus, the implicit introduction of final-shadowing may only happen when
it has no effect on the behavior of the affected variable.
Final-shadowing is thus largely transparent to the user, having no
effect on occurrences of x in S. Occurrences of the name x before
and/or after S are not shadowed. Within S, all occurrences of the name
x are scoped to the final-shadowing declaration.
The effect of final-shadowing a variable x in a statement S may be
visualized as if S is rewritten to the block statement “{final var
x$1 = x; S$1;}” where x$1 is a new name not occurring in the program,
and S$1 is simply S with all occurrences of x replaced by x$1.
Having written all of this down, I realize that there is only a small
motivation to make any of it explicit, as a user-specifiable syntax.
(But it could be, something like “final x;”) If the only point of
final-shadowing is to enhance capture, we can have the discussion about
where to apply the technique, but we don’t need to bikeshed a syntax,
unless the user model requires explicitness in some way.
The only place we urgently need such shadowing is to fix the problem
with old-for loops and lambda-capture, so we can narrow the
applicability to exactly variables x declared by a for-loop, and applied
(when possible and useful) to the body of the for-loop. We could keep
in our back pocket an option to invent a general and explicit syntax in
the future, if for some reason we wanted to let users make explicit
declarations of final shadowing at random points in their code.
The only user-visible effect of final-shadowing is to enable
lambda-capture
(for x, in the statement S where it is final-shadowed).
What’s so special about old-for loops? If final-shadowing is useful
for old-for loops, why not do it everywhere a capture is attempted?
Maybe in the future, but there are a number of issues to iron out, once
you generalize beyond old-for loops.
The special thing about loops (all kinds) is their bodies are executed N
times, where N can be greater than one, for any single given execution
of the containing statement or block. Any variable declared outside of
the loop is declared once but might be used N times during the N
executions of the loop body. This raises a special question of
something like final-shadowing, for uses in the loop body. After all,
you can mark a variable final, but such a variable cannot be given
different values during those N executions, unless
it is declared within the loop body itself. And the loop header is just
outside the loop body.
The extra-special thing about old-for loops is they can declare a
variable x (or several) in their header. Such a variable has, arguably,
an exactly bipartite scope. Typically (not always) the for-header has
all the logic which steps the variable through a range of values, while
the for-body operates (typically, not always) only on one value of that
variable.
I think the consensus (almost unanimous) is that something like the
final-shadowing trick on the loop body would be a fine move, though not
earth-shaking.
It would make old-for and new-for be more like each other which is good.
It would leave while loops out of the party, but… it’s a different
keyword, and also you can refactor a while to be an old-for.
What about while loops? What if I declare my iteration variables before
my while loop? Why should I have to declare it in the old-for loop to
get the extra help from final-shadowing? Why not apply the trick to all
loop bodies of all kinds? That takes us to deeper waters. Doing this
would allow a lambda to capture a loop-varying value. But some loops
mutate the iteration variable in the loop, so what about the limitation
that the value should not be modified by the loop body — should we
relax that?
Along that slippery slope we quickly fall into the question of opening
the floodgates, and doing final-shadowing on all lambdas everywhere.
The rules for a stable spot (not just halfway sliding down the slope)
are hard to foresee. Do we allow any statement anywhere to be
(potentially) final-shadowed, if the language wishes it, in order to do
lambda-capture? How close do we allow side effects to get to
final-shadowed place? The simplest endpoint is just to go with
so-called value-sampling lambdas (C++ has them), which don’t even
pretend
to capture the variable; they just internally do something like
final-shadowing, capturing the present value of the up-level variable,
giving it the same name, and allowing the lambda to work with that
snapshot, even while the up-level variable continues to change.
Why did’t we do this? There are number of reasons, but the core
design goal for capture (of up-level variables in both lambdas and inner
classes) was to eliminate the opportunity to observe a divergence in the
behavior of the original up-level variable, as opposed to the name which
represents it in the lambda.
This is obviously a good goal, since code is easier to read when the
same name means the same thing everywhere, as opposed to “sometimes
the live variable and sometimes a snapshot I already took”, which is
what lambda snapshots give you.
Note that fixing old-for loop cannot add much confusion of this sort,
since the variable in question already has two clearly distinct zones of
use, even though it has but one scope. Random variables outside of the
loop, even if used in the loop, do not enjoy this clarity of usage.
We certainly didn’t want to “stretch” real JVM locals to somehow
be simultaneously present in lambdas and their defining blocks. (In the
‘90s we prototyped sharing a reference to a mutable box but that was a
disaster in several ways.) Thus, the translation strategy needs to copy
a value, for an efficient JVM implementation and race-free behavior.
Still, we wanted to hide that copy, rather than make it be an artifact
of the user experience. Hence, Java 1.1 says “only capture finals”
and Java 8 says “and also effective finals”.
(By the way, capturing a field does not copy. It copies the enclosing
“this” pointer, and so both the lambda and the original block
“see” the same variable, with all of its future mutations. The
common thread here that, for both fields and final locals, a capture
captures 100% of the future history of the variable, equally available
to both parties.)
The present suggestion of final-shadowing is really a tool for
moderating that policy, by nailing down a mutable variable in specific
places to behave “final enough” to be captured. And after going
over it, I think those places are exactly old-for loops, and nowhere
else, at least not yet. Any other place beyond an old-for loop must
either have an explicit marking (syntax TBD) or else be similarly as
paradox-free as old-for loops.
Here’s a paradox to avoid:
```
int x = 1;
Supplier<Integer> last = null;
for (; x < 10; x++) {
last = () -> x;
}
assert last.get() == x; //FAILS
```
The full history of x conflicts with the capture of x, and specifically
the final post-loop value of x is surprising to the user of the lambda.
The old-for loop variables do not have this opportunity for conflict.
Here’s a more comprehensive example to examine, where the old-for loop
makes “one of each kind” of variable:
```
for (int i = 0, j = LIMIT, k = 0; i < j; i++) {
foo(() -> bar(i)); // i: LEGAL AFTER FINAL-SHADOW
foo(() -> bar(j)); // j: EFF-FINAL
//foo(() -> bar(k)); // k: ALWAYS ILLEGAL
k++;
}
```
That might work, and might be enough to remove a serious rough edge from
old-for loops. Beyond that, I don’t see any immediate wins.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20241021/cf23e1ac/attachment-0001.htm>
More information about the amber-dev
mailing list