RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark [v4]

Fri Jan 16 08:20:29 UTC 2026

On Fri, 16 Jan 2026 07:48:08 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

>> I've considered the options here. Maybe I can add some comments in the benchmark later, once we've discussed the arguments.
>> 
>> Let's consider the options:
>> - `Level.Invocation`: this would definitively lead to too much noise, as we would do about equal if not more work in the `Setup` compared to the `Benchmark`.
>> - `Level.Iteration`: In my case, I set the iteration time to `100ms`, so that is quite a bit of time, and dwarfs the time needed for `Setup`. So I think noise is not a big deal here.
>> - `Level.Trial` would be once per fork, which would mean starting up a new VM, and re-compiling all the methods. It would also mean that we could get different profiling leading to different compilations (e.g. unstable-if).
>> 
>> I think `Level.Iteration` strikes a good balance here.
>> 
>> Also: the variance I see in the results is really not so bad, I think the results are quite sharp:
>> 
>> Benchmark                           (NUM_X_OBJECTS)  (SEED)  (SIZE)  Mode  Cnt        Score       Error  Units
>> VectorAlgorithms.filterI_VectorAPI            10000       0  640000  avgt   50   176488.693 ±  2413.738  ns/op
>> VectorAlgorithms.filterI_loop                 10000       0  640000  avgt   50  2257476.735 ± 75274.757  ns/op
>> 
>> And:
>> 
>> Benchmark                         (NUM_X_OBJECTS)  (SEED)  (SIZE)  Mode  Cnt      Score     Error  Units
>> VectorAlgorithms.findI_VectorAPI            10000       0  640000  avgt   50  42521.340 ± 596.106  ns/op
>> VectorAlgorithms.findI_loop                 10000       0  640000  avgt   50  88227.815 ± 745.721  ns/op
>> 
>> 
>> @iwanowww What do you think?
>
>>Maybe resetting inputs between forks is a good compromise.
> 
> Given the 3 options for `Level`, `Iteration` is in the middle, so that would be the compromise ;)
> 
> If I did go with per-fork `Setup`, I would have to have `50` forks, which mean we would have to do warmup for each of the `50` forks. It would drive up the runtime quite a lot.

>Also, it makes it harder to reproduce input dependent variance.

I suppose my whole goal was to eliminate input dependent variance as far as possible. Do you think it would be better to make input dependent variance measurable at the `Iteration` level? I fear that this will make the variance of the benchmark very large, and the results of a fork would be quite noisy.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2697439121