Trying out Vector API with ShiftOr algorithm

Quân Anh Mai anhmdq at gmail.com
Tue Mar 14 14:37:55 UTC 2023


Hi,

I believe your constructor of VectorShiftOr is not right. To be specific,
in these lines:

    filter[low * 4] = filter[low] & mask;
    filter[upper * 4] = filter[upper] & mask;

The indices of the right-hand sides should be low * 4 and upper * 4,
respectively. Additionally, a byte-to-int conversion is defaulted to be
signed, so your load from the filter array should be
filter[Byte.toUnsignedInt(ch)]. With that, I ran your benchmark on the head
of openjdk/jdk and got these numbers:

Benchmark             Mode  Cnt         Score         Error  Units
ShiftOrBench.scalar  thrpt    5  29343807.339 ± 5317309.461  ops/s
ShiftOrBench.vector  thrpt    5  19713483.999 ± 1492701.034  ops/s

The slowdown relative to the scalar implementation is expected since the
CPU can execute those instructions really efficiently, and vector
operations only shine because they can perform on multiple inputs
concurrently, as well as powerful cross-lane operations, both of which are
absent from the benchmark.

With regards to the excessive materialisation of the flags, it has been
fixed in mainline, and the code emitted would be

    vtestps %ymm4, %ymm4
    jne     0x00007fe668b696ac

Thanks,
Quan Anh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230314/3c57a0c4/attachment-0001.htm>


More information about the panama-dev mailing list