Trying out Vector API with ShiftOr algorithm
Dmitriy Olshanskiy
d.olshanskiy at tinkoff.ru
Fri Mar 17 08:55:45 UTC 2023
В Вт, 14/03/2023 в 22:37 +0800, Quân Anh Mai пишет:
> >
> > Hi,
> >
> > I believe your constructor of VectorShiftOr is not right. To be >
> > specific, in these lines:
> >
> > filter[low * 4] = filter[low] & mask;
> > filter[upper * 4] = filter[upper] & mask;
> >
> > The indices of the right-hand sides should be low * 4 and upper *
> > 4, > respectively.
That's what I get being in the hurry, thanks for spotting this.
> > Additionally, a byte-to-int conversion is defaulted to be signed,
> > so > your load from the filter array should be >
> > filter[Byte.toUnsignedInt(ch)].
Technically true, but we stick to ASCII anyway.
> > With that, I ran your benchmark on the head of openjdk/jdk and got
> > > these numbers:
> >
> > Benchmark Mode Cnt Score Error Units
> > ShiftOrBench.scalar thrpt 5 29343807.339 ± 5317309.461 ops/s
> > ShiftOrBench.vector thrpt 5 19713483.999 ± 1492701.034 ops/s
> >
> > The slowdown relative to the scalar implementation is expected
> > since > the CPU can execute those instructions really efficiently,
> > and vector > operations only shine because they can perform on
> > multiple inputs > concurrently, as well as powerful cross-lane
> > operations, both of > which are absent from the benchmark.
I did some experiments with plain C, it seems to be about the same.
This makes it only useful with 3+ strings I guess, which is far from
the common cases we have. Anyway, thanks for taking the time to look
into it.
> >
> > With regards to the excessive materialisation of the flags, it has
> > > been fixed in mainline, and the code emitted would be
> >
> > vtestps %ymm4, %ymm4
> > jne 0x00007fe668b696ac
> >
Great!
> > Thanks,
> > Quan Anh
More information about the panama-dev
mailing list