Trying out Vector API with ShiftOr algorithm
Quân Anh Mai
anhmdq at gmail.com
Tue Mar 14 14:37:55 UTC 2023
Hi,
I believe your constructor of VectorShiftOr is not right. To be specific,
in these lines:
filter[low * 4] = filter[low] & mask;
filter[upper * 4] = filter[upper] & mask;
The indices of the right-hand sides should be low * 4 and upper * 4,
respectively. Additionally, a byte-to-int conversion is defaulted to be
signed, so your load from the filter array should be
filter[Byte.toUnsignedInt(ch)]. With that, I ran your benchmark on the head
of openjdk/jdk and got these numbers:
Benchmark Mode Cnt Score Error Units
ShiftOrBench.scalar thrpt 5 29343807.339 ± 5317309.461 ops/s
ShiftOrBench.vector thrpt 5 19713483.999 ± 1492701.034 ops/s
The slowdown relative to the scalar implementation is expected since the
CPU can execute those instructions really efficiently, and vector
operations only shine because they can perform on multiple inputs
concurrently, as well as powerful cross-lane operations, both of which are
absent from the benchmark.
With regards to the excessive materialisation of the flags, it has been
fixed in mainline, and the code emitted would be
vtestps %ymm4, %ymm4
jne 0x00007fe668b696ac
Thanks,
Quan Anh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230314/3c57a0c4/attachment-0001.htm>
More information about the panama-dev
mailing list