RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v4]

Thu Jan 22 16:33:15 UTC 2026

On Wed, 21 Jan 2026 10:37:24 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> test/micro/org/openjdk/bench/vm/compiler/VectorThroughputForIterationCount.java line 136:
>> 
>>> 134:     // effects of this patch unobservable.
>>> 135:     @Param({"true", "false"})
>>> 136:     public static boolean ENABLE_LARGE_LOOP_WARMUP;
>> 
>> It would be nice to have some more comments here:
>> - for which benchmarks would the effect of "this patch" not be observable? Also: referring to "this patch" will require a future reader to trace things back in the "git blame" history, that's a bit unfortunate.
>> - Generally, it would now be nice to have a summary of which types of benchmarks show what kind of results, and why do we have all the variants.
>
> I'm asking for more comments because I fear the benchmark is becoming harder to use, with all the extra options and benchmark variants.

Really great suggestions.

I'll refine the comments as like:

// When enabled, run an additional warm-up phase using a large loop iteration
// count to encourage C2 to generate vectorized and unrolled loop bodies.
//
// Rationale:
// Some benchmarks in this suite use small, fixed trip-count loops. During
// early profiling, C2 may treat such loops as trivial, avoid vectorization,
// or optimize them away entirely. In those cases, changes that affect loop
// vectorization behavior, such as the improvement introduced by JDK-8307084,
// may not be observable in the generated code.
//
// As a result, this benchmark suite contains two main classes of
// microbenchmarks:
//   1) bench_xx_computeBound / bench_xx_memoryBound
//      These measure the performance of C2-generated code for the given
//      workload without relying on a special warm-up phase.
//   2) bench03xx_staticTripCount / bench03xx_dynamicTripCount
//      These benchmarks are sensitive to early profiling. Enabling a
//      large-loop warm-up forces the optimizer to observe the loop at scale,
//      making vectorized code generation more likely and allowing such
//      effects to be measured.
//
// Usage guidance:
// - Enable for microbenchmarks that rely on observing vectorization or
//   unrolling effects, especially when loop trip counts are small or
//   constant (e.g., bench03xx_staticTripCount and bench03xx_dynamicTripCount,
//   introduced by JDK-8307084).
// - Disable for general regression testing and for other microbenchmarks.
@Param({"true", "false"})
public static boolean ENABLE_LARGE_LOOP_WARMUP;

WDYT?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2717360633