<!DOCTYPE html><html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>

      <blockquote type="cite">One advantage of the current design is

        that it makes the intent of the developer clear</blockquote>

    </p>

    <p>I am also not in favor of the initial proposal, but I share the

      general concern. I see the pain point. Regarding the initial

      proposal:<br>

    </p>

    <pre><blockquote type="cite">for(int index, String s : strings) {</blockquote></pre>

    <p>I think this solution would be too inflexible. Extending the

      syntax of the language only for this one very specific scenario

      seems not justified to me. I think there should rather be a focus

      on re-introducing and simplifying pattern matching in enhanced <font face="monospace">for</font>-loops. Let's consider my previous

      example of what was possible in Java 20:</p>

    <blockquote>

      <pre>for (ListIndex(int index, String element) : enumerate(strings)) {

</pre>

    </blockquote>

    <p>The compiler could be updated to infer the Pattern and support

      the following expression:</p>

    <blockquote>

      <pre>for ((int index, String element) : enumerate(strings)) {

</pre>

    </blockquote>

    <p>Or when using <font face="monospace">var</font>:<br>

    </p>

    <blockquote>

      <pre>for ((var index, var element) : enumerate(strings)) {

</pre>

    </blockquote>

    <p>By keeping the method call on the right, we greatly improve the

      flexibility. Let's for example look at all the following functions

      shipped with Python. They could all benefit from this syntax if

      they were ported to Java.</p>

    <ul>

      <li><a href="https://docs.python.org/3/library/functions.html#enumerate"><font face="monospace">enumerate(iterable, start=0)</font></a></li>

      <li><a href="https://docs.python.org/3/library/functions.html#zip"><font face="monospace">zip(*iterables, strict=False)</font></a></li>

      <li><a href="https://docs.python.org/3/library/itertools.html#itertools.pairwise"><font face="monospace">pairwise(iterable)</font></a></li>

      <li><a href="https://docs.python.org/3/library/itertools.html#itertools.groupby"><font face="monospace">groupby(iterable, key=None)</font></a></li>

      <li><a href="https://docs.python.org/3/library/itertools.html#itertools.product"><font face="monospace">product(*iterables, repeat=1)</font></a><br>

      </li>

      <li><a href="https://docs.python.org/3/library/itertools.html#itertools.combinations"><font face="monospace">combinations(iterable, r)</font></a></li>

      <li><a href="https://docs.python.org/3/library/itertools.html#itertools.permutations"><font face="monospace">permutations(iterable, r=None)</font></a><br>

      </li>

    </ul>

    <p>Besides, I think inferring the pattern is not only useful in

      loops, but also in <font face="monospace">switch</font>

      expressions:</p>

    <blockquote>

      <pre>return switch (pair(state, isWaiting)) {

  case (INITIALIZATION, false) -> "Initializing task";

  case (INITIALIZATION, true ) -> "Waiting for an external process before continuing with the initialization";

  case (IN_PROGRESS   , false) -> "Task in progress";

  case (IN_PROGRESS   , true ) -> "Waiting for an external process";

  case (FINISHED      , _    ) -> "Task finished";

  case (CANCELED      , _    ) -> "Task canceled";

};

</pre>

    </blockquote>

    I have to admit that adding <font face="monospace">`Pair`</font>

    after <font face="monospace">`case`</font> might not be that big of

    a deal in this case, but note that in some cases, the name of the

    type might be much longer, significantly increasing the noise.<br>

    <p>

      <blockquote type="cite">I think i would prefer to have to have an

        indexed stream more than indexed loop</blockquote>

    </p>

    <p>Note that checked exceptions and streams do not work well

      together. At least not in the current state of Java. For the time

      being, I would therefore favor the enhanced <font face="monospace">for</font> loop. (It might be possible to fix

      the interoperability of checked exceptions and streams with union

      types or varargs in type parameters, but neither is planned as far

      as I know.)</p>

    <p>

      <blockquote type="cite">the good news is that it seems something

        we can do using the gatherer API [2] and Valhalla (to avoid the

        cost of creating a a pair (index, element) for each element).</blockquote>

    </p>

    <p>I was wondering if the JIT would already optimize the overhead

      away. I ran some benchmarks using JMH on the <font face="monospace">enumerate(...)</font> method I introduced

      earlier. As you are the second person mentioning Valhalla out of

      performance concerns, I thought I share my results.<br>

    </p>

    <blockquote>

      <pre>fori         (OpenJDK 17) -> enhanced_for (OpenJDK 17) ≈ + 7 %

fori         (OpenJDK 21) -> enhanced_for (OpenJDK 21) ≈ -34 %

enhanced_for (OpenJDK 17) -> enhanced_for (OpenJDK 21) ≈ -29 %

fori         (OpenJDK 21) -> enhanced_for (OpenJDK 17) ≈ - 8 %

</pre>

    </blockquote>

    <p>With OpenJDK 17, my high-level <font face="monospace">enumerate(...)</font>

      method was actually 7 % faster then a low-level old-style <font face="monospace">for</font>-loop. However, in later versions of

      OpenJDK, the high-level code got much slower.<br>

    </p>

    <p>You can find the <a href="https://github.com/JojOatXGME/benchmarks-java/blob/d12c441e16e04a6e7971365ee9672056687ad89b/src/jmh/java/benchmark/EnhancedForHelper.java" moz-do-not-send="true">benchmark implementation at GitHub</a>.

      The benchmark was running within WSL2 and Ubuntu 20.04 on an

      i7-3770 from 2012.</p>

    <blockquote>

      <pre># VM version: JDK 17.0.7, OpenJDK 64-Bit Server VM, 17.0.7+7-nixos

Benchmark                        Mode  Cnt       Score      Error  Units

EnhancedForHelper.enhanced_for  thrpt   10  588852.311 ± 3783.862  ops/s

EnhancedForHelper.fori          thrpt   10  551406.193 ± 1172.687  ops/s

# VM version: JDK 21, OpenJDK 64-Bit Server VM, 21+35-nixos

Benchmark                        Mode  Cnt       Score      Error  Units

EnhancedForHelper.enhanced_for  thrpt   10  419723.971 ± 8903.577  ops/s

EnhancedForHelper.fori          thrpt   10  640767.173 ± 2829.187  ops/s

# VM version: JDK 20, OpenJDK 64-Bit Server VM, 20+36-nixos

Benchmark                                              Mode  Cnt       Score       Error  Units

EnhancedForHelper.enhanced_for                        thrpt   10  430022.265 ±  3050.285  ops/s

EnhancedForHelper.enhanced_for_with_pattern_matching  thrpt   10  325179.547 ±  5206.194  ops/s

EnhancedForHelper.fori                                thrpt   10  631755.837 ± 20495.694  ops/s

</pre>

    </blockquote>

    I also run the Benchmark with Azul Zing for Java 21, which uses LLVM

    for the JIT optimizations. It was about 51 % faster then the fastest

    run I have seen with OpenJDK. However, the warmup-time was

    noticeably longer. There was no big difference between both loops.

    <blockquote>

      <pre># VM version: JDK 21.0.1, Zing 64-Bit Tiered VM, 21.0.1-zing_23.10.0.0-b3-product-linux-X86_64

# *** WARNING: JMH support for this VM is experimental. Be extra careful with the produced data.

Benchmark                        Mode  Cnt       Score       Error  Units

EnhancedForHelper.enhanced_for  thrpt   10  978782.093 ±  4838.520  ops/s

EnhancedForHelper.fori          thrpt   10  965482.460 ± 17837.251  ops/s

</pre>

    </blockquote>

    <p>I have also seen some results with GraalVM for Java 21, but I

      don't have the exact numbers on hand. In general, Native Image was

      very slow on Windows, but competitive with OpenJDK on Linux. The

      GraalVM JDK (no Native Image) was about 40% faster then OpenJDK 21

      and there was no measurable difference between <font face="monospace">fori</font> and <font face="monospace">enhanced_for</font>

      on Linux.<br>

    </p>

    <p>Disclaimer: This is just a micro-benchmark. We don't know how all

      of this translates to real-world applications. I still find it

      interesting how different the optimizations are. I am also a bit

      concerned that OpenJDK 21 got noticeable slower with the

      high-level code compared to OpenJDK 17. I am eager to find out if

      we see a noticeable difference in our end-to-end benchmarks when

      we move forward to OpenJDK 21 at my workplace.<br>

    </p>

  </body>

</html>