RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12]

Vladimir Ivanov vlivanov at openjdk.org
Mon May 8 18:24:24 UTC 2023

On Mon, 1 May 2023 20:20:51 GMT, Cesar Soares Lucas <cslucas at openjdk.org> wrote:

>> Can I please get reviews for this PR? 
>> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested.
>> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N:
>> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png)
>> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1:
>> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png)
>> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently.
>> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges.
>> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless.
>> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures.
> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits:
>  - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges
>  - Address part of PR review 4 & fix a bug setting only_candidate
>  - Catching up with master
>    Merge remote-tracking branch 'origin/master' into rematerialization-of-merges
>  - Fix tests. Remember previous reducible Phis.
>  - Address PR review 3. Some comments and be able to abort compilation.
>  - Merge with Master
>  - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method.
>  - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges.
>  - Add support for SR'ing some inputs of merges used for field loads
>  - Fix some typos and do some small refactorings.
>  - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1

Speaking of debug info design, it seems there's a need for an additional transformation step now.

Originally, all the operations were performed right on the deserialized debug info representation. It was well-justified at first, but slowly accrued with special cases (nulls, autobox, vectors) and merges push it over the limit IMO.

I propose to introduce an additional pass which takes original debug info and, based on current JVM state (`frame` + `RegisterMap`), transforms it into a list of objects to be materialized and a graph of `ScopeValue`s which depend on them. It would isolate preprocessing logic you have scattered across multiple places, simplify rematerialization, make it easier to find out what happens during deoptimizaiton in each particular case. Moreover, it'll enable support for more complex scenarios (e.g., nested merges) which I expect to eventually emerge in followup enhancements.


PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1538835019

More information about the security-dev mailing list