RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java
Nick Gasson
ngasson at openjdk.org
Tue May 23 08:23:57 UTC 2023
On Sat, 6 May 2023 02:01:20 GMT, Chang Peng <duke at openjdk.org> wrote:
> To avoid dead code elimination, a use-point laneIsSet() is added in each benchmark method in MaskFromLongBenchmark.java.
>
> However, currently laneIsSet() [1] is implemented by toLong(). So it may generate a fromLong-toLong pair [2], making this benchmark to be noneffective after inlining laneIsSet() into the outer method. The assembly of maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We cannot see the bdep instruction used by fromLong on AArch64 [4]. So, in this case, we cannot measure fromLong()'s performance by using this benchmark.
>
> This patch uses trueCount() [5] instead of toLong() to measure the fromLong()'s performance effectively. After this patch, we can see the bdep instruction in the hot loop [6] of maskFromLong_byte128 benchmark.
>
> Since using Blackhole to consume VectorMask will generate a heavy vector box, we don't use Blackhole to fix this benchmark.
>
> [1]: https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70
> [2]: https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736
> [3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa
> [4]: https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099
> [5]: https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount()
> [6]: https://gist.github.com/changpeng1997/79bea0a9f80530bec89978950897000d
Marked as reviewed by ngasson (Reviewer).
-------------
PR Review: https://git.openjdk.org/jdk/pull/13851#pullrequestreview-1438956873
More information about the hotspot-compiler-dev
mailing list