RFR: 8325467: Support methods with many arguments in C2 [v11]

Roberto Castañeda Lozano rcastanedalo at openjdk.org
Mon Mar 31 13:47:28 UTC 2025


On Mon, 24 Mar 2025 15:33:34 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lundén has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Extend example with offset register mask

I have reviewed the main bulk of this changeset (all HotSpot changes except those in `chaitin.*` and `ifg.cpp`) and have not found any functional issues. I will have a look at the remaining changes over the next days. Good job, Daniel!

src/hotspot/share/opto/compile.cpp line 677:

> 675:       _java_calls(0),
> 676:       _inner_loops(0),
> 677:       _FIRST_STACK_mask(&_comp_arena),

Suggestion:

      _FIRST_STACK_mask(comp_arena()),

src/hotspot/share/opto/compile.cpp line 678:

> 676:       _inner_loops(0),
> 677:       _FIRST_STACK_mask(&_comp_arena),
> 678:       _regmask_arena(mtCompiler),

Consider tagging this arena with a new `Arena::Tag`, for better compilation memory stat accuracy. Here is my suggested change: https://github.com/robcasloz/jdk/commit/efd3821877dea2f763cbb73364b19eb81a20a110.

src/hotspot/share/opto/compile.cpp line 944:

> 942:       _java_calls(0),
> 943:       _inner_loops(0),
> 944:       _FIRST_STACK_mask(&_comp_arena),

Suggestion:

      _FIRST_STACK_mask(comp_arena()),

src/hotspot/share/opto/matcher.cpp line 148:

> 146:       C->record_method_not_compilable("unsupported incoming calling sequence");
> 147:       return OptoReg::Bad;
> 148:     }

Please consider removing the failure polls after calling `warp_incoming_stk_arg`, I believe the removal of this bailout makes them unnecessary.

src/hotspot/share/opto/matcher.cpp line 195:

> 193:   if (C->failing()) {
> 194:     return;
> 195:   }

Is this failure poll required after your changes?

src/hotspot/share/opto/matcher.cpp line 196:

> 194:     return;
> 195:   }
> 196:   _return_addr_mask.Insert(return_addr());

Could you assert that `_return_addr_mask` is empty before the insertion, to make it easier to see that we are preserving the old behavior?

src/hotspot/share/opto/matcher.cpp line 982:

> 980:   STACK_ONLY_mask.Set_All_From(OptoReg::stack2reg(0));
> 981: 
> 982:   OptoReg::Name i;

Consider moving the declaration of `i` into the `for` statement below.

src/hotspot/share/opto/optoreg.hpp line 237:

> 235:   }
> 236:   OptoRegPair(OptoReg::Name f) : OptoRegPair(OptoReg::Bad, f) {}
> 237:   OptoRegPair() : OptoRegPair(OptoReg::Bad, OptoReg::Bad) {}

This is preexisting, but since the changeset touches the code: these two "partial" constructors seem unused, please consider removing them (but double-check in that case that they are unused for all platforms).

src/hotspot/share/opto/postaloc.cpp line 686:

> 684:               assert(!(!value[ureg_lo] && lrgs(useidx).mask().is_offset() &&
> 685:                        !lrgs(useidx).mask().Member(ureg_lo)),
> 686:                      "invalid assumption");

Could you use more descriptive names and assertion messages in this new assertion and the one below? Ideally, without having to refer to old versions. What is the invariant that we want to check? How does it relate to the surrounding code?

src/hotspot/share/opto/regmask.hpp line 545:

> 543: 
> 544:   // Overlap test. Non-zero if any registers in common, including all-stack.
> 545:   bool overlap(const RegMask &rm) const {

Please review the frequency of the different tests in this function. I ran an instrumented version and found the test in Case 4 to succeed (return true) more often that Case 2 and Case 3.

src/hotspot/share/utilities/globalDefinitions.hpp line 1363:

> 1361: // synchronized statements in Java.
> 1362: const int BoxLockNode_slot_limit = 200;
> 1363: 

This definition seems too C2-specific to be put in this shared file, could it be moved e.g. to `optoreg.hpp`?

-------------

Changes requested by rcastanedalo (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2729213050
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021007017
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021020456
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021007752
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021067375
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021058074
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021050321
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021054876
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021034249
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021042384
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021029514
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2021024332


More information about the hotspot-compiler-dev mailing list