RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts
Fei Gao
fgao at openjdk.org
Fri Jan 9 14:20:00 UTC 2026
On Mon, 10 Nov 2025 16:07:35 GMT, Fei Gao <fgao at openjdk.org> wrote:
>> @fg1417 Are you still working on this?
>
> Hi @eme64, many thanks for your review. It’s really comprehensive and insightful. I’ve given a thumbs-up to all the comments that have been resolved in this commit.
>
>> I have one concern: We now have changed the branches. There is now a long sequence of branches if we have very few iterations, so that we only go through pre and post loop. It would be interesting to see what the performance difference is between master and patch.
>
> Regarding this concern, I re-ran the microbenchmarks (now merged with the existing `VectorThroughputForIterationCount.java` ), named as `bench03*_drain_memoryBound`, and collected data across different platforms, including `128-bit` and `256-bit` `AArch64` machines as well as a `512-bit` `x86` machine.
>
> To summarize, I observe a minor performance regression for small-iteration loops on the `128-bit` and `256-bit` `AArch64` platforms. For larger-iteration loops, there is either a performance improvement or no noticeable change. The performance data on the `512-bit x86` machine shows a similar trend, though the regression is more significant.
>
> **The test range of `ITERATION_COUNT` is `0–300`. For larger `ITERATION_COUNT` values, there is either a performance improvement or no noticeable change, so those results are omitted. The following data only shows cases with regressions.**
>
>
> (FIXED_OFFSET) (RANDOMIZE_OFFSETS) (REPETITIONS) (seed) Mode Cnt
> 0 TRUE 1024 42 avgt 3
>
> `Diff = (patch - master) / master`
>
> On `128-bit aarch64` platform:
>
> Benchmark (ITERATION_COUNT) Units Diff
> bench031B_drain_memoryBound 1 ns/op 15.15%
> bench031B_drain_memoryBound 2 ns/op 10.89%
> bench031B_drain_memoryBound 3 ns/op 9.27%
> bench031B_drain_memoryBound 4 ns/op 7.39%
> bench031B_drain_memoryBound 5 ns/op 5.86%
> bench031B_drain_memoryBound 6 ns/op 5.31%
> bench031B_drain_memoryBound 7 ns/op 4.39%
> bench031B_drain_memoryBound 8 ns/op 4.27%
> bench031B_drain_memoryBound 9 ns/op 3.60%
> bench031B_drain_memoryBound 10 ns/op 3.11%
> bench031B_drain_memoryBound 11 ns/op 2.97%
> bench031B_drain_memoryBound 12 ns/op 3.19%
> bench031B_drain_memoryBound 13 ns/op 2.90%
> bench031B_drain_memoryBound 14 ns/op 2.68%
> bench031B_drain_memoryBound 15 ns/op 2.37%
> bench031B_drain_memoryBound 16 ns/op 2.44%
> bench031B_drain_memoryBound 17 ns/op 2.11%
> bench031B_drain_memoryBound 18 ns...
> @fg1417 I hope you had a good start into the new year.
Hi @eme64, Happy New Year!
> I'd love to make this PR a bit of a priority in the next weeks. Would you mind synching with master and fixing merge conflicts?
Yes, absolutely. I’ve rebased it internally, and the new commit is currently under testing. Once the testing is complete, I’ll push it.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3729090839
More information about the hotspot-compiler-dev
mailing list