RFR: 8325467: Support methods with many arguments in C2 [v17]
Emanuel Peter
epeter at openjdk.org
Wed Apr 23 08:30:00 UTC 2025
On Thu, 10 Apr 2025 09:44:14 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:
>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>>
>> ### Changeset
>>
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>>
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>>
>> ### Testing
>>
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lundén has updated the pull request incrementally with one additional commit since the last revision:
>
> Refactor and improve TestNestedSynchronize.java
Now I only have the tests remaining. Here some more VM code comments.
src/hotspot/share/opto/regmask.hpp line 429:
> 427: }
> 428:
> 429: RegMask(const RegMask& rm) : RegMask(rm, nullptr) {}
This is the copy constructor, right? Can you add a comment what kind of implementation you chose here, and why?
src/hotspot/share/opto/regmask.hpp line 448:
> 446: }
> 447:
> 448: // Empty mask check. Ignores registers included through the all-stack flag.
Suggestion:
// Empty mask check. Ignores registers included through the all_stack flag.
For "greppability"
src/hotspot/share/opto/regmask.hpp line 452:
> 450: assert(valid_watermarks(), "sanity");
> 451: for (unsigned i = _lwm; i <= _hwm; i++) {
> 452: if (_rm_up(i)) {
Is this an implicit `nullptr` check? If so, you need to make it explicit, the style guide forbids implicit null checks.
src/hotspot/share/opto/regmask.hpp line 464:
> 462: for (unsigned i = _lwm; i <= _hwm; i++) {
> 463: uintptr_t bits = _rm_up(i);
> 464: if (bits) {
This also looks like an implicit null check.
src/hotspot/share/opto/regmask.hpp line 473:
> 471:
> 472: // Get highest-numbered register from mask, or BAD if mask is empty. Ignores
> 473: // registers included through the all-stack flag.
Suggestion:
// registers included through the all_stack flag.
Greppability
src/hotspot/share/opto/regmask.hpp line 480:
> 478: while (i > _lwm) {
> 479: uintptr_t bits = _rm_up(--i);
> 480: if (bits) {
Implicit null check?
src/hotspot/share/opto/regmask.hpp line 512:
> 510: tmp |= _rm_up(i);
> 511: }
> 512: return !tmp && is_AllStack();
Implicit null checks?
Could you not check `is_AllStack` early and return already?
src/hotspot/share/opto/regmask.hpp line 551:
> 549: static int num_registers(uint ireg, LRG &lrg);
> 550:
> 551: // Overlap test. Non-zero if any registers in common, including all-stack.
Suggestion:
// Overlap test. Non-zero if any registers in common, including all_stack.
greppability
src/hotspot/share/opto/regmask.hpp line 561:
> 559: unsigned lwm = MAX2(_lwm, rm._lwm);
> 560: for (unsigned i = lwm; i <= hwm; i++) {
> 561: if (_rm_up(i) & rm._rm_up(i)) {
Implicit null check?
src/hotspot/share/opto/regmask.hpp line 589:
> 587: }
> 588: }
> 589: }
There is a bit of code duplication here. A helper method could help, no pun intended.
src/hotspot/share/opto/regmask.hpp line 619:
> 617: _hwm = _rm_max();
> 618: _set_range(0, 0xFF, _rm_size);
> 619: set_AllStack(true);
Suggestion:
set_AllStack();
You already have a default `true` value, right? Check for other occurances.
src/hotspot/share/opto/regmask.hpp line 711:
> 709: _hwm = rm._rm_max();
> 710: }
> 711: // Narrow the watermarks if &rm spans a narrower range. Update after to
Suggestion:
// Narrow the watermarks if rm spans a narrower range. Update after to
src/hotspot/share/opto/regmask.hpp line 747:
> 745:
> 746: // Subtract 'rm' from 'this', but ignore everything in 'rm' that does not
> 747: // overlap with us and to not modify our all-stack flag. Supports masks of
A little confused about the grammar in `and to not modify our all-stack flag`...
src/hotspot/share/opto/regmask.hpp line 799:
> 797:
> 798: public:
> 799: unsigned int static basic_rm_size() { return _RM_SIZE; }
What makes it `basic`? Elsewhere, you call `RM_SIZE` the "base size". I don't understand this part, so I'm just asking if you think it is consistent?
-------------
PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2786315683
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055470536
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055474336
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055479300
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055480721
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055481600
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055482640
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055485184
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055485667
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055486853
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055493829
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055500757
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055504351
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055507927
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2055517750
More information about the hotspot-compiler-dev
mailing list