RFR: 8325467: Support methods with many arguments in C2 [v24]
Emanuel Peter
epeter at openjdk.org
Mon Sep 1 08:37:07 UTC 2025
On Fri, 29 Aug 2025 09:38:58 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:
>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>>
>> ### Changeset
>>
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>>
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>>
>> ### Testing
>>
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lundén has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits:
>
> - Restore modified java/lang/invoke tests
> - Sort includes (new requirement)
> - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
> - Add clarifying comments at definitions of register mask sizes
> - Fix implicit zero and nullptr checks
> - Add deep copy comment
> - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates
> - Fix typo
> - Updates after Emanuel's comments
> - Refactor and improve TestNestedSynchronize.java
> - ... and 25 more: https://git.openjdk.org/jdk/compare/b39c7369...80c6cf47
I thought I'd dive straight back into `regmask.hpp`. I'm remembering some of what we discussed, but I'll need you help to fill in the picture ;)
I wonder if we could do some renamings in a prior PR, just to make this a little easier to review.
src/hotspot/share/opto/regmask.hpp line 44:
> 42: // statements in Java.
> 43: const int BoxLockNode_SLOT_LIMIT = 200;
> 44:
Even before this constant, it would be nice to have an introductory comment, that lays out what the regmask is for, and what its basic design is.
src/hotspot/share/opto/regmask.hpp line 63:
> 61: // RM_SIZE is the base size of a register mask in 32-bit words.
> 62: // RM_SIZE_MIN is the theoretical minimum size of a register mask in 32-bit
> 63: // words.
It seems this is a bad pattern that was already here before you. But it really makes me a little scared here.
Having two variable names differ in just an underscore `_` but with different semantics is a bit confusing to me. It is hard for the reader to keep track of what is what going forward. It would be really easy for someone to confuse the two in the future and have bugs creap in that way (just because of an underscore). It may be more useful to use the units in at least one of the two names.
I would love to see names like `RM_SIZE` and `RM_SIZE_IN_LONGS`, rather than `RM_SIZE` and `_RM_SIZE`.
Even better would be `RM_SIZE_IN_INTS` and `RM_SIZE_IN_LONGS`. That way, you rould save a lot of comments. Maybe you could come up with even better names. "slots" and "words"?
You could consider doing a renaming PR first before the patch here. Maybe you can even automate the renaming with a command/script, and then apply the same renaming to the changes here?
src/hotspot/share/opto/regmask.hpp line 96:
> 94: (((RM_SIZE_MIN << 5) + // Slots for machine registers
> 95: (max_method_parameter_length * 2) + // Slots for incoming arguments
> 96: (max_method_parameter_length * 2) + // Slots for outgoing arguments
What's the meaning of incoming vs outgoing arguments? Like this?
Incoming = from caller (outer nesting)
Outgoing = to nested call (inner nesting)
src/hotspot/share/opto/regmask.hpp line 122:
> 120:
> 121: // Viewed as an array of machine words
> 122: uintptr_t _RM_UP[_RM_SIZE];
Do you know what `UP` stands for? Could we rename it maybe?
Would be nice if we could have the same "units" for these arrays than for the sizes above.
src/hotspot/share/opto/regmask.hpp line 128:
> 126: // extend the register mask with dynamically allocated memory. We keep the
> 127: // base statically allocated _RM_UP, and arena allocate the extended mask
> 128: // (RM_UP_EXT) separately. Another, perhaps more elegant, option would be to
Suggestion:
// (_RM_UP_EXT) separately. Another, perhaps more elegant, option would be to
Underscore for consistency? Or does it reference something else?
src/hotspot/share/opto/regmask.hpp line 161:
> 159: // cases, we can allow read-only sharing.
> 160: bool _read_only = false;
> 161: #endif
Can you explain why this happens? Is this something we could clean up? It smells a bit like tech-dept. But maybe it is a really necessary performance optimization. Would be nice if there was an explanation which one it is ;)
src/hotspot/share/opto/regmask.hpp line 170:
> 168: // variable indicates how many words we offset with. We consider all
> 169: // registers before the offset to not be included in the register mask.
> 170: unsigned int _offset;
Does that mean we make different slices of the mask?
src/hotspot/share/opto/regmask.hpp line 175:
> 173: // mask can currently represent to be included. If _all_stack = false, we
> 174: // consider the registers not included.
> 175: bool _all_stack = false;
I'd prefer to have some kind of `_is_...` name here. Because when I read `all_stack` and see it is a bool, I wonder what it means - it does not tell me quickly. Does it mean that all registers are on the stack?
Is everything that is beyond the register mask purely on the stack? Is everything from the stack always beyond the register mask? I'm confused :face_with_peeking_eye:
src/hotspot/share/opto/regmask.hpp line 179:
> 177: // The low and high watermarks represent the lowest and highest word that
> 178: // might contain set register mask bits, respectively. We guarantee that
> 179: // there are no bits in words outside this range, but any word at and between
In the example below, you have 1 bits above the `_hwm`. Is that intentional? Are those bits to be ignored? Can you please add some extra info to the example about that?
src/hotspot/share/opto/regmask.hpp line 217:
> 215: // necessarily representing stack locations) to 1. Here is how the above
> 216: // register mask looks like after clearing, setting _all_stack to true, and
> 217: // successfully rolling over:
I'm still struggling to follow here. Maybe `_offset` is not clear to me yet. What is the value here for it? How is it changed with the `rollover`?
src/hotspot/share/opto/regmask.hpp line 230:
> 228: // \_______________________________________________________________________________/
> 229: // |
> 230: // _rm_size
Ah, I remember this now. Really helpful. Maybe we could link to this layout explanation from the comment at the very top of the file?
-------------
PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-3172500942
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313199061
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313162130
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313223912
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313184547
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313195111
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313207232
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313263478
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313219662
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313253475
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313264670
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2313256455
More information about the hotspot-compiler-dev
mailing list