RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts

Fei Gao fgao at openjdk.org
Fri Jan 9 14:20:00 UTC 2026


On Mon, 10 Nov 2025 16:07:35 GMT, Fei Gao <fgao at openjdk.org> wrote:

>> @fg1417 Are you still working on this?
>
> Hi @eme64, many thanks for your review. It’s really comprehensive and insightful. I’ve given a thumbs-up to all the comments that have been resolved in this commit.
> 
>> I have one concern: We now have changed the branches. There is now a long sequence of branches if we have very few iterations, so that we only go through pre and post loop. It would be interesting to see what the performance difference is between master and patch. 
> 
> Regarding this concern, I re-ran the microbenchmarks (now merged with the existing `VectorThroughputForIterationCount.java` ), named as `bench03*_drain_memoryBound`, and collected data across different platforms, including `128-bit` and `256-bit` `AArch64` machines as well as a `512-bit` `x86` machine.
> 
> To summarize, I observe a minor performance regression for small-iteration loops on the `128-bit` and `256-bit` `AArch64` platforms. For larger-iteration loops, there is either a performance improvement or no noticeable change. The performance data on the `512-bit x86` machine shows a similar trend, though the regression is more significant.
> 
> **The test range of `ITERATION_COUNT` is `0–300`. For larger `ITERATION_COUNT` values, there is either a performance improvement or no noticeable change, so those results are omitted. The following data only shows cases with regressions.**
> 
> 
> (FIXED_OFFSET)  (RANDOMIZE_OFFSETS)  (REPETITIONS)  (seed)  Mode  Cnt
>     0                TRUE                1024         42    avgt    3
> 
> `Diff = (patch - master) / master`
> 
> On `128-bit aarch64` platform:
> 
> Benchmark    (ITERATION_COUNT)    Units    Diff
> bench031B_drain_memoryBound    1    ns/op    15.15%
> bench031B_drain_memoryBound    2    ns/op    10.89%
> bench031B_drain_memoryBound    3    ns/op    9.27%
> bench031B_drain_memoryBound    4    ns/op    7.39%
> bench031B_drain_memoryBound    5    ns/op    5.86%
> bench031B_drain_memoryBound    6    ns/op    5.31%
> bench031B_drain_memoryBound    7    ns/op    4.39%
> bench031B_drain_memoryBound    8    ns/op    4.27%
> bench031B_drain_memoryBound    9    ns/op    3.60%
> bench031B_drain_memoryBound    10    ns/op    3.11%
> bench031B_drain_memoryBound    11    ns/op    2.97%
> bench031B_drain_memoryBound    12    ns/op    3.19%
> bench031B_drain_memoryBound    13    ns/op    2.90%
> bench031B_drain_memoryBound    14    ns/op    2.68%
> bench031B_drain_memoryBound    15    ns/op    2.37%
> bench031B_drain_memoryBound    16    ns/op    2.44%
> bench031B_drain_memoryBound    17    ns/op    2.11%
> bench031B_drain_memoryBound    18    ns...

> @fg1417 I hope you had a good start into the new year. 

Hi @eme64, Happy New Year!

> I'd love to make this PR a bit of a priority in the next weeks. Would you mind synching with master and fixing merge conflicts?

Yes, absolutely. I’ve rebased it internally, and the new commit is currently under testing. Once the testing is complete, I’ll push it.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3729090839


More information about the hotspot-compiler-dev mailing list