RFR: 8287567: AArch64: Implement post-call NOPs

Andrew Haley aph at openjdk.java.net
Wed Jun 1 07:50:33 UTC 2022


On Wed, 1 Jun 2022 06:05:37 GMT, Rickard Bäckman <rbackman at openjdk.org> wrote:

> Is there any advantage to having the first instruction be "nop" instead of a branch that skips over the movks?

That's an interesting question. I've been looking at the performance data for some high-end AArch64 implementations, and I see that they all can execute multiple movks per cycle. Apple M1 can do 6/cycle, Arm Neoverse N2 can do 4/cycle. One both of these cores, the nops cost nothing except code cache space: on M1, nops don't even issue: they are consumed by the front end. On Neoverse N2, a nop is fused with any following instruction, so the total cost is at most 2 movks.

> I was going to suggest:
> 
> ```
>  b done
>  (raw data)
>  done:
> ```
> 
> but then it might be hard for NativePostCallNop::check() to prevent false positives.

Impossible, I would have thought. The raw data might match any instruction: you need some redundancy somewhere.
There is a way to compress the raw data down to 20 bits, by using 6 bits as an index and 14 bits as an offset. But that's a micro-optimization for another day, I think.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8955


More information about the hotspot-dev mailing list