RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2]
Claes Redestad
redestad at openjdk.java.net
Thu Feb 11 14:30:37 UTC 2021
On Thu, 11 Feb 2021 13:52:09 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
> Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies.
Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2522
More information about the hotspot-compiler-dev
mailing list