RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18]

Vladimir Ivanov vlivanov at openjdk.org
Wed Jun 14 20:54:16 UTC 2023


On Wed, 14 Jun 2023 19:29:45 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

>> Can I please get reviews for this PR? 
>> 
>> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested.
>> 
>> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N:
>> 
>> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png)
>> 
>> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1:
>> 
>> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png)
>> 
>> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently.
>> 
>> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges.
>> 
>> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless.
>> 
>> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures.
>
> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits:
> 
>  - Merge branch 'openjdk:master' into rematerialization-of-merges
>  - Rome minor refactorings.
>  - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges
>    Catching up with master.
>  - Address PR review 6: debug format output & some refactoring.
>  - Catching up with master branch.
>    
>    Merge remote-tracking branch 'origin/master' into rematerialization-of-merges
>  - Address PR review 6: refactoring around rematerialization & improve test cases.
>  - Address PR review 5: refactor on rematerialization & add tests.
>  - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges
>  - Address part of PR review 4 & fix a bug setting only_candidate
>  - Catching up with master
>    
>    Merge remote-tracking branch 'origin/master' into rematerialization-of-merges
>  - ... and 9 more: https://git.openjdk.org/jdk/compare/57b82512...939dcffe

Overall, the testing went well. (It discovered some minor issues I commented about.)
I'm rerunning some benchmarks which reported suspicious results. Will keep you posted.

src/hotspot/share/opto/c2_globals.hpp line 473:

> 471:           " register allocation.")                                          \
> 472:                                                                             \
> 473:   product(bool, ReduceAllocationMerges, true,                               \

I suggest to turn the flag into diagnostic one. There are much stricter requirements for product flags, so better to avoid introducing new ones.

src/hotspot/share/opto/c2_globals.hpp line 476:

> 474:           "Try to simplify allocation merges before Scalar Replacement")    \
> 475:                                                                             \
> 476:   develop(bool, TraceReduceAllocationMerges, false,                         \

The flag is debug-only while you use it in non-product code (`NOT_PRODUCT` macro). It doesn't break the build, but make the code under `NOT_PRODUCT` macro useless in optimized builds. You could either get rid of `NOT_PRODUCT` usages or turn the flag into `notproduct` one.

src/hotspot/share/opto/c2compiler.cpp line 150:

> 148:       if (C.failure_reason_is(retry_no_reduce_allocation_merges())) {
> 149:         assert(do_reduce_allocation_merges, "must make progress");
> 150:         do_reduce_allocation_merges = false;

I consider the check here as a safety net which is intended to provide graceful degradation in performance if RAM optimization misbehaves for some reason. But bailing out an optimization is better than bailing out the whole compilation.  I suggest to introduce new diagnostic flag (e.g., `VerifyReduceAllocationMerges`) and add a guarantee call here which signals whenever we encounter a problematic case. I'm fine with handling that as a separate enhancement (it makes sense to dump additional diagnostic info at the place where such bail outs are triggered ).

test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java line 43:

> 41:     public static void main(String[] args) {
> 42:         TestFramework.runWithFlags("-XX:+ReduceAllocationMerges",
> 43:                                    "-XX:+TraceReduceAllocationMerges",

`TraceReduceAllocationMerges` and `DeoptimizeALot` are not available in product binaries, so the test fails there.
You need to either limit the test to debug builds only or add `-XX:+IgnoreUnrecognizedVMOptions`. 

If `ReduceAllocationMerges` is turned into diagnostic flag, you need to specify `-XX:+UnlockDiagnosticVMOptions`.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1591966055
PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1230121828
PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1230145481
PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1230152798
PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1230119032



More information about the security-dev mailing list