RFR: 8373696: AArch64: Refine post-call NOPs
Andrew Haley
aph at openjdk.org
Thu Dec 18 13:55:41 UTC 2025
On Tue, 16 Dec 2025 22:45:08 GMT, Ruben <duke at openjdk.org> wrote:
> Extend MOVK-based scheme to MOVK/MOVZ allowing to store 19 bits of metadata.
>
> Choose number of metadata slots in post-call NOP sequence between 1 and 2 depending on the offset from the CodeBlob header.
>
> Additionally, implement ADR/ADRP-based metadata storage - that provides 22 bits instead of 16 bits to store metadata. This can be enabled via UsePostCallSequenceWithADRP option.
>
>
> Renaissance 0.15.0 benchmark results (MOVK-based scheme)
> Neoverse V1.
> The runs were limited to 16 cores.
>
> Number of runs:
> 6 for baseline, 6 for the changes - interleaved pairs.
>
> Command line:
> java -jar renaissance-jmh-0.15.0.jar \
> -bm avgt -gc true -v extra \
> -jvmArgsAppend '-Xbatch -XX:-UseDynamicNumberOfCompilerThreads \
> -XX:-CICompilerCountPerCPU -XX:ActiveProcessorCount=16 \
> -XX:CICompilerCount=2 -Xms8g -Xmx8g -XX:+AlwaysPreTouch \
> -XX:+UseG1GC'
>
> The change is geometric mean of ratios across 6 the pairs of runs.
>
> | Benchmark | Change | 90% CI for the change |
> | ----------------------------------------------------- | -------- | --------------------- |
> | org.renaissance.actors.JmhAkkaUct.run | -0.215% | -2.652% to 1.357% |
> | org.renaissance.actors.JmhReactors.run | -0.166% | -1.974% to 1.775% |
> | org.renaissance.jdk.concurrent.JmhFjKmeans.run | 0.222% | -0.492% to 0.933% |
> | org.renaissance.jdk.concurrent.JmhFutureGenetic.run | -1.880% | -2.438% to -1.343% |
> | org.renaissance.jdk.streams.JmhMnemonics.run | -0.500% | -1.032% to 0.089% |
> | org.renaissance.jdk.streams.JmhParMnemonics.run | -0.740% | -2.092% to 0.639% |
> | org.renaissance.jdk.streams.JmhScrabble.run | -0.031% | -0.353% to 0.310% |
> | org.renaissance.neo4j.JmhNeo4jAnalytics.run | -0.873% | -2.323% to 0.427% |
> | org.renaissance.rx.JmhRxScrabble.run | -0.512% | -1.121% to 0.049% |
> | org.renaissance.scala.dotty.JmhDotty.run | -0.219% | -1.108% to 0.708% |
> | org.renaissance.scala.sat.JmhScalaDoku.run | -2.750% | -6.426% to -0.827% |
> | org.renaissance.scala.stdlib.JmhScalaKmeans.run | 0.046% | -0.383% to 0.408% |
> | org.renaissance.scala.stm.JmhPhilosophers.run | 1.497% | -0.955% to 3.923% |
> | org.renaissance.scala.stm.JmhScalaStmBench7.run | -0.096% | -0.773% to 0.586% |
> | org.renaissance.twitter.finagle.J...
On 18/12/2025 13:28, Ruben wrote:
> The |B.nv| might not be suitable in this case - I believe the branch
> will be "always taken".
Ah, so it is. That's a shame.
> However, I had considered using |CBNZ XZR, <#imm>|. So far, I avoided
> implementing it because it is unclear what performance effects this
> might have.
I've tried it, and it's definitely a lot slower than a NOP or a MOVZ.
It would be nice if we could persuade the architecture people to give as
a NOP with payload, perhaps because "Intel has it."
But if we choose the split between CB offset and oopmap index wisely, we
can avoid using a second instruction most of the time.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/28855#issuecomment-3670392141
More information about the hotspot-dev
mailing list