RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4]

Fri Jan 23 10:23:57 UTC 2026

On Thu, 22 Jan 2026 15:18:33 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> I'm asking for more comments because I fear the benchmark is becoming harder to use, with all the extra options and benchmark variants.
>
> Really great suggestions.
> 
> I'll refine the comments as like:
> 
> // When enabled, run an additional warm-up phase using a large loop iteration
> // count to encourage C2 to generate vectorized and unrolled loop bodies.
> //
> // Rationale:
> // Some benchmarks in this suite use small, fixed trip-count loops. During
> // early profiling, C2 may treat such loops as trivial, avoid vectorization,
> // or optimize them away entirely. In those cases, changes that affect loop
> // vectorization behavior, such as the improvement introduced by JDK-8307084,
> // may not be observable in the generated code.
> //
> // As a result, this benchmark suite contains two main classes of
> // microbenchmarks:
> //   1) bench_xx_computeBound / bench_xx_memoryBound
> //      These measure the performance of C2-generated code for the given
> //      workload without relying on a special warm-up phase.
> //   2) bench03xx_staticTripCount / bench03xx_dynamicTripCount
> //      These benchmarks are sensitive to early profiling. Enabling a
> //      large-loop warm-up forces the optimizer to observe the loop at scale,
> //      making vectorized code generation more likely and allowing such
> //      effects to be measured.
> //
> // Usage guidance:
> // - Enable for microbenchmarks that rely on observing vectorization or
> //   unrolling effects, especially when loop trip counts are small or
> //   constant (e.g., bench03xx_staticTripCount and bench03xx_dynamicTripCount,
> //   introduced by JDK-8307084).
> // - Disable for general regression testing and for other microbenchmarks.
> @Param({"true", "false"})
> public static boolean ENABLE_LARGE_LOOP_WARMUP;
> 
> WDYT?

That sounds quite good, actually :)

But I do wonder: why should we not also have the "large loop warmup" for the other benchmarks? Are we sure that they would not also be affected? Or what exactly is the explanation that we cannot see the impact of this patch on those benchmarks?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2720609553