RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM [v3]

Wed Oct 9 15:02:06 UTC 2024

On Tue, 8 Oct 2024 21:58:17 GMT, AGSaidi <duke at openjdk.org> wrote:

>>> Could you share more data?
>> 
>> This is a simple native benchmark: https://cr.openjdk.org/~bulasevich/AlignBench.cpp
>> 
>> The program tests the performance of code starting from different alignments 
>> using a generated block of code that primarily consists of NOP (no operation)
>> and RET (return from subroutine) instructions.
>> 
>> In this experiment (though it may not be meaningful for real-world scenarios) 
>> on G2 execution time increases with misalignment from the 64-byte boundary, with the 
>> best performance at exact 64-byte alignment and slower times as the offset grows.
>> 
>> align_64+0: 391.457ms 391.157ms 391.481ms
>> align_64+4: 392.245ms 392.057ms 391.792ms
>> align_64+8: 392.154ms 392.962ms 392.65ms
>> align_64+12: 393.485ms 393.307ms 393.658ms
>> align_64+16: 394.434ms 394.679ms 394.284ms
>> align_64+20: 395.681ms 395.709ms 394.843ms
>> align_64+24: 395.799ms 396.396ms 395.977ms
>> align_64+28: 397.379ms 397.278ms 397.359ms
>> align_64+32: 397.677ms 397.82ms 397.95ms
>> align_64+36: 399.08ms 398.88ms 399.075ms
>> align_64+40: 399.829ms 400.118ms 399.981ms
>> align_64+44: 400.916ms 400.747ms 401.241ms
>> align_64+48: 401.736ms 401.831ms 402.54ms
>> align_64+52: 402.705ms 402.569ms 402.446ms
>> align_64+56: 403.718ms 403.822ms 403.535ms
>> align_64+60: 404.722ms 404.824ms 404.726ms
>> align_64+64: 390.852ms 390.669ms 391.051ms
>
> This benchmark certainly has some limitations as nops are special and can be discarded and having the 64B alignment means the core can quickly fetch and throw away as many nops as possible. I'm not sure that it's particularly representative of a real output and the benefits that code density can bring here.

The behavior remains unchanged when replacing the NOP with an ADD x1,x1,x1 instruction. That said, I fully agree with you that the benchmark is peculiar, and the result doesn't necessarily indicate whether the platform is sensitive to code entry alignment. Additionally, I'd like to point out that on the same N1 platform, EEMBC's CoreMark benchmark runs 0.07% faster (a small difference, I know) on G2 when built with -falign-functions=64 compared to -falign-functions=16, with the result for -falign-functions=32 falling in between. This makes me doubt that CodeEntryAlignment=16/32 is reasonable for N1.

Let me ask Vladimir if it is possible to check the performance with CodeEntryAlignment=32 setting.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20864#discussion_r1793685554