RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v3]

Chang Peng duke at openjdk.org
Thu May 18 09:54:54 UTC 2023


On Mon, 15 May 2023 10:59:11 GMT, Andrew Haley <aph at openjdk.org> wrote:

> > > This looks like it might be removed by loop opts. I think you might need a blackhole somewhere.
> > 
> > 
> > `m` will be updated in every iteration of this loop, so `m` is not a loop-invariants actually. I can see the assembly code of this loop by using JMH perfasm.
> 
> Isn't it? Looks to me like all it does is flip `m` each time. Whether or not this code is optimized today isn't relevant.
> 
> So it's the same as
> 
> ```
>         for (int i = 0; i < LENGTH/2; i++) {
>             res += m.trueCount();
>         }
>         m = m.not();
>         for (int i = 0; i < LENGTH/2; i++) {
>             res += m.trueCount();
>         } 
> ```
> 
> ... which is trivially optimizable, no?

Sorry for the delay.

Yes, actually they do the same thing, though current C2 compiler cannot do such optimization so far. Anyway, I have updated this benchmark to avoid potential optimization and ensure that we can measure performance effectively.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13974#discussion_r1197626911


More information about the core-libs-dev mailing list