RFR: 8325467: Support methods with many arguments in C2 [v15]

Emanuel Peter epeter at openjdk.org
Mon Apr 7 07:16:56 UTC 2025


On Fri, 4 Apr 2025 14:11:56 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:

>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>> 
>> ### Changeset
>> 
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>> 
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>> 
>> ### Testing
>> 
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lundén has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Revise overlap comments for frequency of cases

A few quick first comments :)

src/hotspot/share/adlc/formsopt.cpp line 180:

> 178:   //   in the register mask regardless of how much slack is created by rounding.
> 179:   //   This was found necessary after adding 16 new registers for APX.
> 180:   return (words_for_regs + 3 + 1 + 1) & ~1;

Is the comment above still accurate? Specifically this:

  // on the stack (stack registers) up to some interesting limit.  Methods
  // that need more parameters will NOT be compiled.  On Intel, the limit
  // is something like 90+ parameters.

And if not: might there be other comments around like this?

src/hotspot/share/opto/regmask.hpp line 40:

> 38: // stack slots used by BoxLockNodes. We reach this limit by, e.g., deeply
> 39: // nesting synchronized statements in Java.
> 40: const int BoxLockNode_slot_limit = 200;

Where does this number come from? I've added arbitrary constants like this, and sometimes it is hard to give a good justification. But at least writing down what was your thinking might help someone else if they come across it later. Do you have a sense how large it should be at least or at most?

src/hotspot/share/opto/regmask.hpp line 86:

> 84:       (((RM_SIZE_MIN << 5) +                // Slots for machine registers
> 85:         (max_method_parameter_length * 2) + // Slots for incoming arguments
> 86:         (max_method_parameter_length * 2) + // Slots for outgoing arguments

Why `*2`? Is that for 64 bit arguments that are split into two 32 bit words?

src/hotspot/share/opto/regmask.hpp line 104:

> 102:     // the machine registers and usually all parameters that need to be passed
> 103:     // on the stack (stack registers) up to some interesting limit. On Intel,
> 104:     // the limit is something like 90+ parameters.

Here you fixed the comment, so probably the other one needs to be fixed too ;)

-------------

PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2745689438
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2030561927
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2030577639
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2030584435
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2030585708


More information about the hotspot-compiler-dev mailing list