Trying out Vector API with ShiftOr algorithm

Fri Mar 17 08:55:45 UTC 2023

В Вт, 14/03/2023 в 22:37 +0800, Quân Anh Mai пишет:
> > 
> > Hi,
> > 
> > I believe your constructor of VectorShiftOr is not right. To be >
> > specific, in these lines:
> > 
> >     filter[low * 4] = filter[low] & mask;
> >     filter[upper * 4] = filter[upper] & mask;
> > 
> > The indices of the right-hand sides should be low * 4 and upper *
> > 4, > respectively.

That's what I get being in the hurry, thanks for spotting this.

> >  Additionally, a byte-to-int conversion is defaulted to be signed,
> > so > your load from the filter array should be >
> > filter[Byte.toUnsignedInt(ch)]. 

Technically true, but we stick to ASCII anyway.

> > With that, I ran your benchmark on the head of openjdk/jdk and got
> > > these numbers:
> > 
> > Benchmark             Mode  Cnt         Score         Error  Units
> > ShiftOrBench.scalar  thrpt    5  29343807.339 ± 5317309.461  ops/s
> > ShiftOrBench.vector  thrpt    5  19713483.999 ± 1492701.034  ops/s
> > 
> > The slowdown relative to the scalar implementation is expected
> > since > the CPU can execute those instructions really efficiently,
> > and vector > operations only shine because they can perform on
> > multiple inputs > concurrently, as well as powerful cross-lane
> > operations, both of > which are absent from the benchmark.

I did some experiments with plain C, it seems to be about the same.
This makes it only useful with 3+ strings I guess, which is far from
the common cases we have. Anyway, thanks for taking the time to look
into it.

> > 
> > With regards to the excessive materialisation of the flags, it has
> > > been fixed in mainline, and the code emitted would be
> > 
> >     vtestps %ymm4, %ymm4
> >     jne     0x00007fe668b696ac
> > 

Great!

> > Thanks,
> > Quan Anh