RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts

Fei Gao fgao at openjdk.org
Tue Jan 13 15:12:39 UTC 2026


On Fri, 9 Jan 2026 13:54:49 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> Hi @eme64, many thanks for your review. It’s really comprehensive and insightful. I’ve given a thumbs-up to all the comments that have been resolved in this commit.
>> 
>>> I have one concern: We now have changed the branches. There is now a long sequence of branches if we have very few iterations, so that we only go through pre and post loop. It would be interesting to see what the performance difference is between master and patch. 
>> 
>> Regarding this concern, I re-ran the microbenchmarks (now merged with the existing `VectorThroughputForIterationCount.java` ), named as `bench03*_drain_memoryBound`, and collected data across different platforms, including `128-bit` and `256-bit` `AArch64` machines as well as a `512-bit` `x86` machine.
>> 
>> To summarize, I observe a minor performance regression for small-iteration loops on the `128-bit` and `256-bit` `AArch64` platforms. For larger-iteration loops, there is either a performance improvement or no noticeable change. The performance data on the `512-bit x86` machine shows a similar trend, though the regression is more significant.
>> 
>> **The test range of `ITERATION_COUNT` is `0–300`. For larger `ITERATION_COUNT` values, there is either a performance improvement or no noticeable change, so those results are omitted. The following data only shows cases with regressions.**
>> 
>> 
>> (FIXED_OFFSET)  (RANDOMIZE_OFFSETS)  (REPETITIONS)  (seed)  Mode  Cnt
>>     0                TRUE                1024         42    avgt    3
>> 
>> `Diff = (patch - master) / master`
>> 
>> On `128-bit aarch64` platform:
>> 
>> Benchmark    (ITERATION_COUNT)    Units    Diff
>> bench031B_drain_memoryBound    1    ns/op    15.15%
>> bench031B_drain_memoryBound    2    ns/op    10.89%
>> bench031B_drain_memoryBound    3    ns/op    9.27%
>> bench031B_drain_memoryBound    4    ns/op    7.39%
>> bench031B_drain_memoryBound    5    ns/op    5.86%
>> bench031B_drain_memoryBound    6    ns/op    5.31%
>> bench031B_drain_memoryBound    7    ns/op    4.39%
>> bench031B_drain_memoryBound    8    ns/op    4.27%
>> bench031B_drain_memoryBound    9    ns/op    3.60%
>> bench031B_drain_memoryBound    10    ns/op    3.11%
>> bench031B_drain_memoryBound    11    ns/op    2.97%
>> bench031B_drain_memoryBound    12    ns/op    3.19%
>> bench031B_drain_memoryBound    13    ns/op    2.90%
>> bench031B_drain_memoryBound    14    ns/op    2.68%
>> bench031B_drain_memoryBound    15    ns/op    2.37%
>> bench031B_drain_memoryBound    16    ns/op    2.44%
>> bench031B_drain_memo...
>
> @fg1417 I hope you had a good start into the new year. I'd love to make this PR a bit of a priority in the next weeks. Would you mind synching with master and fixing merge conflicts?
> 
> I'd review, run testing and look into running some benchmarks myself.

Hi @eme64 the PR is ready for review and testing. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/22629#issuecomment-3744877402


More information about the hotspot-compiler-dev mailing list