RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM

Mon Sep 16 18:16:08 UTC 2024

On Sat, 14 Sep 2024 16:22:05 GMT, Andrew Haley <aph at openjdk.org> wrote:

>>> Do you reproduce the regression on a public benchmark that I can also try?
>> 
>> It was our internal benchmark.
>
>> > @vnkozlov Many thanks! Do you reproduce the regression on a public benchmark that I can also try? Now I restrict CodeEntryAlignment=16 for V1 and V2 only. And I restart my performance tests.
>> 
>> This may have as much to do with the smallish icache
> 
> Sorry, I meant last level cache

@theRealAph 
> It makes little sense to set the default CodeEntryAlignment to less than the icache line size. except in severely constrained environments.

Why do we need CodeEntryAlignment? The instruction prefetcher has more time to load the next cache line if execution starts at the beginning of the current cache line. But this consideration makes more sense for OptoLoopAlignment. Ideally, the entire loop body fits into a limited number of instruction cache lines - this is unlikely to happen with the entire nmethod body.

I have experimented with code entry alignment on native application (repeatedly calling a large number of aligned/unaligned short methods) and found that for Neoverse N2 CPU 64-byte alignment is preferable, while no difference was observed for Neoverse V2. I am not sure if this is a feature of the processor implementation or a feature of the Neoverse architecture. The Neoverse N2/V2 technical reference manuals are pretty much the same about L1 instruction memory system features.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2353588638