Auto indexing improved for() loops

Sat Dec 9 17:02:21 UTC 2023

> One advantage of the current design is that it makes the intent of the 
> developer clear

I am also not in favor of the initial proposal, but I share the general 
concern. I see the pain point. Regarding the initial proposal:

> for(int index, String s : strings) {

I think this solution would be too inflexible. Extending the syntax of 
the language only for this one very specific scenario seems not 
justified to me. I think there should rather be a focus on 
re-introducing and simplifying pattern matching in enhanced for-loops. 
Let's consider my previous example of what was possible in Java 20:

    for (ListIndex(int index, String element) : enumerate(strings)) {

The compiler could be updated to infer the Pattern and support the 
following expression:

    for ((int index, String element) : enumerate(strings)) {

Or when using var:

    for ((var index, var element) : enumerate(strings)) {

By keeping the method call on the right, we greatly improve the 
flexibility. Let's for example look at all the following functions 
shipped with Python. They could all benefit from this syntax if they 
were ported to Java.

  * enumerate(iterable, start=0)
    <https://docs.python.org/3/library/functions.html#enumerate>
  * zip(*iterables, strict=False)
    <https://docs.python.org/3/library/functions.html#zip>
  * pairwise(iterable)
    <https://docs.python.org/3/library/itertools.html#itertools.pairwise>
  * groupby(iterable, key=None)
    <https://docs.python.org/3/library/itertools.html#itertools.groupby>
  * product(*iterables, repeat=1)
    <https://docs.python.org/3/library/itertools.html#itertools.product>
  * combinations(iterable, r)
    <https://docs.python.org/3/library/itertools.html#itertools.combinations>
  * permutations(iterable, r=None)
    <https://docs.python.org/3/library/itertools.html#itertools.permutations>

Besides, I think inferring the pattern is not only useful in loops, but 
also in switch expressions:

    return switch (pair(state, isWaiting)) {
       case (INITIALIZATION, false) -> "Initializing task";
       case (INITIALIZATION, true ) -> "Waiting for an external process before continuing with the initialization";
       case (IN_PROGRESS   , false) -> "Task in progress";
       case (IN_PROGRESS   , true ) -> "Waiting for an external process";
       case (FINISHED      , _    ) -> "Task finished";
       case (CANCELED      , _    ) -> "Task canceled";
    };

I have to admit that adding `Pair` after `case` might not be that big of 
a deal in this case, but note that in some cases, the name of the type 
might be much longer, significantly increasing the noise.

> I think i would prefer to have to have an indexed stream more than 
> indexed loop

Note that checked exceptions and streams do not work well together. At 
least not in the current state of Java. For the time being, I would 
therefore favor the enhanced for loop. (It might be possible to fix the 
interoperability of checked exceptions and streams with union types or 
varargs in type parameters, but neither is planned as far as I know.)

> the good news is that it seems something we can do using the gatherer 
> API [2] and Valhalla (to avoid the cost of creating a a pair (index, 
> element) for each element).

I was wondering if the JIT would already optimize the overhead away. I 
ran some benchmarks using JMH on the enumerate(...) method I introduced 
earlier. As you are the second person mentioning Valhalla out of 
performance concerns, I thought I share my results.

    fori         (OpenJDK 17) -> enhanced_for (OpenJDK 17) ≈ + 7 %
    fori         (OpenJDK 21) -> enhanced_for (OpenJDK 21) ≈ -34 %
    enhanced_for (OpenJDK 17) -> enhanced_for (OpenJDK 21) ≈ -29 %
    fori         (OpenJDK 21) -> enhanced_for (OpenJDK 17) ≈ - 8 %

With OpenJDK 17, my high-level enumerate(...) method was actually 7 % 
faster then a low-level old-style for-loop. However, in later versions 
of OpenJDK, the high-level code got much slower.

You can find the benchmark implementation at GitHub 
<https://github.com/JojOatXGME/benchmarks-java/blob/d12c441e16e04a6e7971365ee9672056687ad89b/src/jmh/java/benchmark/EnhancedForHelper.java>. 
The benchmark was running within WSL2 and Ubuntu 20.04 on an i7-3770 
from 2012.

    # VM version: JDK 17.0.7, OpenJDK 64-Bit Server VM, 17.0.7+7-nixos

    Benchmark                        Mode  Cnt       Score      Error  Units
    EnhancedForHelper.enhanced_for  thrpt   10  588852.311 ± 3783.862  ops/s
    EnhancedForHelper.fori          thrpt   10  551406.193 ± 1172.687  ops/s

    # VM version: JDK 21, OpenJDK 64-Bit Server VM, 21+35-nixos

    Benchmark                        Mode  Cnt       Score      Error  Units
    EnhancedForHelper.enhanced_for  thrpt   10  419723.971 ± 8903.577  ops/s
    EnhancedForHelper.fori          thrpt   10  640767.173 ± 2829.187  ops/s

    # VM version: JDK 20, OpenJDK 64-Bit Server VM, 20+36-nixos

    Benchmark                                              Mode  Cnt       Score       Error  Units
    EnhancedForHelper.enhanced_for                        thrpt   10  430022.265 ±  3050.285  ops/s
    EnhancedForHelper.enhanced_for_with_pattern_matching  thrpt   10  325179.547 ±  5206.194  ops/s
    EnhancedForHelper.fori                                thrpt   10  631755.837 ± 20495.694  ops/s

I also run the Benchmark with Azul Zing for Java 21, which uses LLVM for 
the JIT optimizations. It was about 51 % faster then the fastest run I 
have seen with OpenJDK. However, the warmup-time was noticeably longer. 
There was no big difference between both loops.

    # VM version: JDK 21.0.1, Zing 64-Bit Tiered VM, 21.0.1-zing_23.10.0.0-b3-product-linux-X86_64
    # *** WARNING: JMH support for this VM is experimental. Be extra careful with the produced data.

    Benchmark                        Mode  Cnt       Score       Error  Units
    EnhancedForHelper.enhanced_for  thrpt   10  978782.093 ±  4838.520  ops/s
    EnhancedForHelper.fori          thrpt   10  965482.460 ± 17837.251  ops/s

I have also seen some results with GraalVM for Java 21, but I don't have 
the exact numbers on hand. In general, Native Image was very slow on 
Windows, but competitive with OpenJDK on Linux. The GraalVM JDK (no 
Native Image) was about 40% faster then OpenJDK 21 and there was no 
measurable difference between fori and enhanced_for on Linux.

Disclaimer: This is just a micro-benchmark. We don't know how all of 
this translates to real-world applications. I still find it interesting 
how different the optimizations are. I am also a bit concerned that 
OpenJDK 21 got noticeable slower with the high-level code compared to 
OpenJDK 17. I am eager to find out if we see a noticeable difference in 
our end-to-end benchmarks when we move forward to OpenJDK 21 at my 
workplace.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-dev/attachments/20231209/ece34c20/attachment.htm>