RFR: 8280872: Reorder code cache segments to improve code density [v6]

Fri Mar 18 11:05:41 UTC 2022

On Wed, 16 Mar 2022 09:37:29 GMT, Boris Ulasevich <bulasevich at openjdk.org> wrote:

>> Currently the codecache segment order is [non-nmethod, non-profiled, profiled]. With this change we move the non-nmethod segment between two code segments. It changes nothing for any platform besides AARCH.
>> 
>> In AARCH the offset limit for a branch instruction is 128MB. The bigger jumps are encoded with three instructions. Most of far branches are jumps into the non-nmethod blobs. With the non-nmethod segment in between code segments the jump distance from method to the stub becomes shorter. The result is a 4% reduction in generated code size for the CodeCache range from 128MB to 240MB.
>> 
>> As a side effect, the performance of some tests is slightly improved:
>> ``ArraysFill.testCharFill      10  thrpt   15  170235.720 -> 178477.212  ops/ms``
>> 
>> Testing: jdk/hotspot jtreg and microbenchmarks on AMD and AARCH
>
> Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   rename, adding test

Hi Boris,

Thanks for doing this change. In general it looks good! I only have minor comments and questions which I've added inline.

Also, can you please update the Summary of the JBS issue to match that of the PR? I think the Summary of the PR is more adequate because the change also contains some shared changes.

And please update the description of the PR and replace "*It changes nothing for any platform besides AARCH*" with something like "*Currently only the aarch64 backend is adapted to make use of these changes*". Because the segments are actually re-orded for all platforms.

Thank you and best regards,
Volker

src/hotspot/cpu/aarch64/aarch64.ad line 1282:

> 1280: 
> 1281:   static uint size_exception_handler() {
> 1282:     return MacroAssembler::far_codestub_branch_size();

Can you please also use this for `size_deopt_handler()` below?
I.e.  `NativeInstruction::instruction_size() + MacroAssembler::far_codestub_branch_size()`.

Also, once you've done this I think you can strengthen the assertions in `HandlerImpl::emit_exception_handler()`/`HandlerImpl::emit_deopt_handler()` from "`<=`" to "`==`".

src/hotspot/cpu/aarch64/icBuffer_aarch64.cpp line 56:

> 54:   __ ldr(rscratch2, l);
> 55:   int jump_code_size = __ far_jump(ExternalAddress(entry_point));
> 56:   // IC stub code size is not expected to vary depending on target address.

Does the new code still align `cached_value` on a  `wordSize` boundary as this was ensured before by `align(wordsize)`? I think that's only true if `code_begin` is guaranteed to start at a  `wordSize` boundary because `far_jump` is either one or three instructions (plus one `ldr` instruction). If yes, please add a comment explaining that. Otherwise explain why the alignment isn't necessary anymore.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 399:

> 397:   }
> 398:   // codecache size: 128M..240M
> 399:   return !CodeCache::is_non_nmethod(addr);

Is it possible to further refine this to also catch calls from C1 to C1 and C2 to C2 which obviously wouldn't need a far call as well?

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 428:

> 426:     uint64_t offset;
> 427:     // We can use ADRP here because we know that the total size of
> 428:     // the code cache cannot exceed 2Gb.

Not directly related to your change, but what's correct here:
- the comment which says "code cache can't exceed 2gb"
- the assertion above which asserts `ReservedCodeCacheSize < 4*G`

Maybe you can fix this while you're on it?

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1071:

> 1069:   address trampoline_call(Address entry, CodeBuffer* cbuf = NULL);
> 1070: 
> 1071:   // Jumps that can reach anywhere in the code cache.

Not sure why you've moved this comment from the `far_call()`/`far_jump()` functions here? Please move back.

If you want to add a comment to this function it could be something like `Check if generic branches in the code cache require a far jump`

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1075:

> 1073:   }
> 1074: 
> 1075:   // Jumps that can reach anywhere in the code cache.

Same as before. Move the original comment down to the `far_call()`/`far_jump()` functions as before.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1076:

> 1074:   }
> 1075: 
> 1076:   // Jumps that can reach a nonmethod stub

This should read something like `Check if branches to the the non nmethod section require a far jump`

src/hotspot/share/code/codeCache.cpp line 306:

> 304:   //      Profiled nmethods
> 305:   //         Non-nmethods
> 306:   //    Non-profiled nmethods

In the JBS issue and in the description of this pull request you say that you *move the non-nmethod segment between the two other code segments* (i.e. low address -> profiled -> non-nmethod -> non-profiled) but here you also change the order of "profiled" and "non-profiled" (i.e. in your code "non-profiled" comes first).
Is there any reason for doing this, instead of just moving the "non-nmethod" segment between the "profiled" and "non-profiled" segments as described in the JBS issue and in the PR description?

-------------

Changes requested by simonis (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7517