RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM

Mon Sep 9 19:15:03 UTC 2024

On Thu, 5 Sep 2024 00:58:10 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

> With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications
> 
> Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance.
> 
> The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows:
> - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform.
> - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them).
> 
> I believe it is time to remove the comment and update the default value.
> 
> I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment.
> 
> For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture.
> 
> Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results:
> - No performance impact on ...

As @bulasevich pointed the change was done by Albert  to fix VM warning:
https://bugs.openjdk.org/browse/JDK-8029799

Quote: 
"The freelist of the code cache exceeds 10'000 items, which results in a VM warning.
The problem behind the warning is that the freelist is populated by a large number
of small free blocks. For example, in failing test case (see header), the freelist grows
 up to more than 3500 items where the largest item on the list is 9 segments (one segment
 is 64 bytes). That experiment was done on my laptop. Such a large freelist can indeed be 
a performance problem, since we use a linear search to traverse the freelist."

The warning is about huge freelist which is scanned linearly to find corresponding free space in CodeCache for next allocation. It is become big with tiered compilation because we do a lot of C1 compiled code which is replaced with C2 compiled code.

The fix for 8029799 did optimization for freelist search for allocation by selecting first which have enough space. This reduce time of search but on other hand may increase fragmentation of CodeCache space.

There were several optimization done for this code by @RealLucy [JDK-8223444](https://bugs.openjdk.org/browse/JDK-8223444) and [JDK-8231460](https://bugs.openjdk.org/browse/JDK-8231460). But it is still using `linked list` for free segments. Should we consider something more complex? Or it is not an issue?

> Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results:

Which of these two flags setting improved performance most?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2338885078
PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2338886472