RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4]

Fri Jan 16 07:52:41 UTC 2026

On Thu, 15 Jan 2026 17:58:41 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

>> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   add dotProductF
>
> test/hotspot/jtreg/compiler/vectorization/TestVectorAlgorithms.java line 97:
> 
>> 95:         framework.addFlags("--add-modules=jdk.incubator.vector", "-XX:CompileCommand=inline,*VectorAlgorithmsImpl::*");
>> 96:         switch (args[0]) {
>> 97:             case "vanilla"        -> { /* no extra flags */ }
> 
> It would be more flexible to let arbitrary VM flags to be appended.

What exactly are you suggesting here? Are you suggesting that instead of doing:

` * @run driver ${test.main.class} noSuperWord`
we could do
` * @run driver ${test.main.class} -XX:-OptimizeFill`

And then just `framework.addFlags(args)`?

> test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithms.java line 95:
> 
>> 93:     }
>> 94: 
>> 95:     @Setup(Level.Iteration)
> 
> Resetting after each iteration may introduce too much noise. Also, it makes it harder to reproduce input dependent variance. Maybe resetting inputs between forks is a good compromise.

I've considered the options here. Maybe I can add some comments in the benchmark later, once we've discussed the arguments.

Let's consider the options:
- `Level.Invocation`: this would definitively lead to too much noise, as we would do about equal if not more work in the `Setup` compared to the `Benchmark`.
- `Level.Iteration`: In my case, I set the iteration time to `100ms`, so that is quite a bit of time, and dwarfs the time needed for `Setup`. So I think noise is not a big deal here.
- `Level.Trial` would be once per fork, which would mean starting up a new VM, and re-compiling all the methods. It would also mean that we could get different profiling leading to different compilations (e.g. unstable-if).

I think `Level.Iteration` strikes a good balance here.

Note: I need to reset the data many times, because some benchmarks like `findI` may have drastically different runtime depending on the data. `findI` has an early exit, so if the exit is at the beginning of the array, runtime is low, and if it is at the end of the array the runtime is high. The runtime is basically uniformly distributed over the length of the array. That's why I run `50` iterations that are relatively short `100ms`, but not too short so that the `Setup` does not dominate too much.

@iwanowww What do you think?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2697368748
PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2697357507