RFR: 8256820: AArch64: Optimize vector rotate (immediate) with shift and insert instructions [v2]
Dong Bo
dongbo at openjdk.java.net
Mon Dec 14 11:48:10 UTC 2020
On Mon, 14 Dec 2020 09:59:42 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> Dong Bo has updated the pull request incrementally with one additional commit since the last revision:
>>
>> delete all moves
>
> This patch is very hard to review because much of it is just moving things around. Please do this as two PRs, one which does all the moves and one with the substantive changes. Thanks.
Thanks for the quick reply. Deleted all the moves in the updated version.
We tested this on the two servers we have by hand now. It is a pity for us that we didn't see performance improvements on `Cortex-A72`.
Because we have detected microarchitecture in the code already, I thought we could make full use of the flexibility it provided.
We find the shift and insert instructions are used in Linux OS for serveral crypto algorithms for all CPUs [1, 2, 3].
But as of now, we only have the micro benchmarks for JDK as shown before.
I'll try to investigate to see if there is any workload can benifit from this.
Maybe someone from the community can help test this on other CPUs, like `ThunderX`, `Cortex-A73`? Thanks. :)
[1] https://github.com/torvalds/linux/blob/master/arch/arm64/crypto/chacha-neon-core.S
[2] https://github.com/torvalds/linux/blob/master/arch/arm64/crypto/crct10dif-ce-core.S
[3] https://github.com/torvalds/linux/blob/master/arch/arm64/crypto/sha512-armv8.pl
-------------
PR: https://git.openjdk.java.net/jdk/pull/1761
More information about the hotspot-dev
mailing list