RFR: 8325467: Support methods with many arguments in C2 [v2]
Roberto Castañeda Lozano
rcastanedalo at openjdk.org
Tue Aug 20 07:34:51 UTC 2024
On Thu, 8 Aug 2024 09:29:17 GMT, Daniel Lundén <dlunden at openjdk.org> wrote:
>> If a method has a large number of parameters, we currently bail out from C2 compilation.
>>
>> ### Changeset
>>
>> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes.
>>
>> Changes:
>> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed.
>> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable.
>> - Remove all `can_represent` checks and bailouts.
>> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`.
>> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`.
>>
>> ### Testing
>>
>> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450)
>> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64.
>> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement.
>> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no...
>
> Daniel Lundén has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove leftover CHUNK_SIZE reference
src/hotspot/share/opto/regmask.hpp line 129:
> 127: } else {
> 128: assert(_RM_UP_EXT != nullptr, "sanity");
> 129: assert( i >= _RM_SIZE, "sanity");
This assertion is trivial (in my opinion) and can be removed.
src/hotspot/share/opto/regmask.hpp line 144:
> 142:
> 143: // Return a suitable arena for (extended) register mask allocation.
> 144: static Arena* _get_arena();
Maybe inline this function into its only user?
src/hotspot/share/opto/regmask.hpp line 148:
> 146: // Grow the register mask to ensure it can fit at least min_size words.
> 147: void _grow(unsigned int min_size, bool init = true) {
> 148: if(min_size > _rm_size) {
Suggestion:
if (min_size > _rm_size) {
src/hotspot/share/opto/regmask.hpp line 165:
> 163: if (init) {
> 164: int fill = 0;
> 165: if(is_AllStack()) {
Suggestion:
if (is_AllStack()) {
src/hotspot/share/opto/regmask.hpp line 208:
> 206:
> 207: // Set a range of words in the register mask to a given value.
> 208: void _set_range(unsigned int start, int value, unsigned int range) {
Suggestion: `length`, `words`, or similar would be a more idiomatic name for the third parameter (`range`), in my opinion.
src/hotspot/share/opto/regmask.hpp line 278:
> 276: // itself as the _all_stack flag. We need to record this fact using the now
> 277: // separate _all_stack flag.
> 278: set_AllStack(_RM_UP[_RM_MAX] & (uintptr_t(1) << _WordBitMask));
As we discussed offline, I suggest simplifying this by updating the ADLC code as well to set `_all_stack` explicitly. Here is a patch that accomplishes that: https://github.com/robcasloz/jdk/commit/835628893ed5ea838c16afa49adbd57e4152a5dd, feel free to merge if you agree. The patch passes tier1-3 for all Oracle-supported platforms.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1721792317
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1722812982
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1722813539
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1722814189
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1722827247
PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r1722822957
More information about the hotspot-compiler-dev
mailing list