RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v3]
Chang Peng
duke at openjdk.org
Thu May 18 09:54:54 UTC 2023
On Mon, 15 May 2023 10:59:11 GMT, Andrew Haley <aph at openjdk.org> wrote:
> > > This looks like it might be removed by loop opts. I think you might need a blackhole somewhere.
> >
> >
> > `m` will be updated in every iteration of this loop, so `m` is not a loop-invariants actually. I can see the assembly code of this loop by using JMH perfasm.
>
> Isn't it? Looks to me like all it does is flip `m` each time. Whether or not this code is optimized today isn't relevant.
>
> So it's the same as
>
> ```
> for (int i = 0; i < LENGTH/2; i++) {
> res += m.trueCount();
> }
> m = m.not();
> for (int i = 0; i < LENGTH/2; i++) {
> res += m.trueCount();
> }
> ```
>
> ... which is trivially optimizable, no?
Sorry for the delay.
Yes, actually they do the same thing, though current C2 compiler cannot do such optimization so far. Anyway, I have updated this benchmark to avoid potential optimization and ensure that we can measure performance effectively.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13974#discussion_r1197626911
More information about the core-libs-dev
mailing list