RFR: 8287567: AArch64: Implement post-call NOPs
Andrew Haley
aph at openjdk.java.net
Wed Jun 1 07:50:33 UTC 2022
On Wed, 1 Jun 2022 06:05:37 GMT, Rickard Bäckman <rbackman at openjdk.org> wrote:
> Is there any advantage to having the first instruction be "nop" instead of a branch that skips over the movks?
That's an interesting question. I've been looking at the performance data for some high-end AArch64 implementations, and I see that they all can execute multiple movks per cycle. Apple M1 can do 6/cycle, Arm Neoverse N2 can do 4/cycle. One both of these cores, the nops cost nothing except code cache space: on M1, nops don't even issue: they are consumed by the front end. On Neoverse N2, a nop is fused with any following instruction, so the total cost is at most 2 movks.
> I was going to suggest:
>
> ```
> b done
> (raw data)
> done:
> ```
>
> but then it might be hard for NativePostCallNop::check() to prevent false positives.
Impossible, I would have thought. The raw data might match any instruction: you need some redundancy somewhere.
There is a way to compress the raw data down to 20 bits, by using 6 bits as an index and 14 bits as an offset. But that's a micro-optimization for another day, I think.
-------------
PR: https://git.openjdk.java.net/jdk/pull/8955
More information about the hotspot-dev
mailing list