RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM

Mon Sep 9 16:47:37 UTC 2024

With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications

Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance.

The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows:
- The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform.
- Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them).

I believe it is time to remove the comment and update the default value.

I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment.

For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture.

Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results:
- No performance impact on Hotspot microbenchmarks and other microbenchmarks.
- On the Renaissance Dotty benchmark:
  - 0.2-0.4% performance improvement on Neoverse N1/V1/V2 architectures
  - 0.7% performance improvement on Raspberry Pi Model 3 (ARM32, ARM1176JZF-S)
  - slight performance degradation on Cortex-A72, reproducible only with the CodeEntryAlignment update.

I suggest changing the CodeCacheSegmentSize for AARCH64 and ARM32 and updating the CodeEntryAlignment for AARCH64 Neoverse platforms.

-------------

Commit messages:
 - 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM

Changes: https://git.openjdk.org/jdk/pull/20864/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20864&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8339573
  Stats: 6 lines in 3 files changed: 4 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/20864.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20864/head:pull/20864

PR: https://git.openjdk.org/jdk/pull/20864