RFR: 8361892: AArch64: Incorrect matching rule leading to improper oop instruction encoding [v2]

Tue Jul 15 12:32:45 UTC 2025

On Tue, 15 Jul 2025 12:08:41 GMT, Andrew Haley <aph at openjdk.org> wrote:

> IMHO that doesn't help very much. You're still looking at ~4 cycles to load from L1 dcache, and you kill a dcache line for it, and you consume a full xword of code space for it. If we put the byte-map base on a 32-bit boundary we only need a single MOVZ, and that can be relocated easily enough when we load the code from the archive. Surely that's better. Or is even that small work too much?

Well, it helps a bit even though it suffers from the faults you identify.

The downside of using even a single MOVZ is that every on requries a reloc during AOT code reloading. So, the number of relocs in any code blob that we handle during loading is no longer 1. Instead it equals the number of post-barriers in the blob.

The bigger hit will come when we try to optimize code loading by mmapping AOT code blobs into the code cache -- at present we copy-relocate it from an mmapped region. Instead of just one constant to patch at the start of the blob we will have many places to patch scattered throughout the code blob. So, many more copy-on-write pages rather than vanilla mapped pages. That drags back in copy overheads that the mmap is intended to avoid and also means less opportunities for co-hosted JVMs to share pages.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26249#issuecomment-3073410591