RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v3]

Emanuel Peter epeter at openjdk.org
Mon Dec 8 15:34:03 UTC 2025


On Wed, 12 Nov 2025 12:39:52 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> Wait, you are doing some kind of special warmup above. Why? Do you maybe NOT want the methods to inline? Any other reason for the warmup?
>
> If I understand correctly, when `ITERATION_COUNT` is set to a fixed value, all loop optimizations will know the loop iteration count from profiling. Without a special warm-up phase, the main loop is unlikely to be auto-vectorized for these small iteration counts, because [policy_unroll()](https://github.com/openjdk/jdk/blob/400a83da893f5fc285a175b63a266de21e93683c/src/hotspot/share/opto/loopTransform.cpp#L960) in C2 always attempts to generate code that is optimal for the current trip count based on profiling information. It may decide not to auto-vectorize, or even remove the loop entirely and keep only some scalar nodes. As a result, we can’t observe the potential effects of this patch.
> 
> The special warm-up phase would instead trigger auto-vectorization and full unrolling. I suppose this patch takes effect in scenarios where certain Java loops have already been compiled with auto-vectorization and unrolling, and are later used to process data with smaller array sizes. What do you think?

Sorry, I dropped the ball on this one. A lot going on with JDK26 and other larger PRs.

Ah I see. You are indeed doing some special warmup here. That should be better documented. I wonder also if you want to make this a parameter, so we can see the performance with and without it?

At some point I need to check out your patch and see what effect it has on the benchmarks I'm presenting here:
https://github.com/openjdk/jdk/pull/27315

Do you think it would really not be measurable for small sizes? If not, we would have to find other methods to make a difference for small iteration counts.

> It may decide not to auto-vectorize, or even remove the loop entirely and keep only some scalar nodes.

It could be worth creating some IR tests to see what exactly happens here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2599072156


More information about the hotspot-compiler-dev mailing list