RFR: 8324969: C2: prevent elimination of unbalanced coarsened locking regions [v7]
Vladimir Kozlov
kvn at openjdk.org
Sun Feb 25 05:05:57 UTC 2024
On Sun, 25 Feb 2024 04:46:10 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> The issue is that when we do nested locks elimination we don't check that synchronized region has both `Lock` and `Unlock` nodes.
>> Which can happen if locks coarsening optimization eliminated pair of Unlock/Lock nodes from adjacent locking regions before we check for nested locks.
>>
>> Consider this code (all locks/unlocks use the same object):
>>
>> lock1(obj) { // outer synchronized region
>> lock2(obj) {
>> // nested synchronized region 1
>> } unlock2(obj);
>> lock3(obj) {
>> // nested synchronized region 2
>> } unlock3(obj);
>> } unlock1(obj);
>>
>> If `lock3` directly follows `unlock2` (no branches or safepoints) locks coarsening optimization will remove them:
>>
>>
>> lock1(obj) { // outer synchronized region
>> lock2(obj) {
>> // nested synchronized region 1
>> };
>> {
>> // nested synchronized region 2
>> } unlock3(obj);
>> } unlock1(obj);
>>
>>
>> Nested locks elimination code checks only `Lock` node in one region to find if it is nested (inside other lock region which use the same object) and then eliminate it. So we end up with not eliminated `Unlock` node in second nested region.
>>
>> Why we don't hit this issue before? Normally nested locks elimination executed first and only then we do locks coarsening elimination. In the example above we eliminate all nested `Lock` and `Unlock` nodes, leaving only outer `Lock` and `Unlock`.
>>
>> The additional factors which leads to the failure is fully unrolled loop around nested sync regions and some allocation to trigger Escape Analysis:
>>
>>
>> lock1(obj) { // outer synchronized region
>> Test var = new Test(); // Triggers EA
>> for (I = 0; I < 3; i++) { // small iteration number to fully unroll
>> lock2(obj) {
>> // nested synchronized region 1
>> } unlock2(obj);
>> lock3(obj) {
>> // nested synchronized region 2
>> } unlock3(obj);
>> }
>> } unlock1(obj);
>>
>> Before executing Escape Analysis we do loops optimization to simplify graph: [compile.cpp#L2332](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2332)
>> We also allow to fully unroll short loops (LoopOptsMaxUnroll) to remove merges from graph. It helps EA eliminate allocations.
>> Such unrolling creates several `Lock` and `Unlock` nodes per synchronized region. But nested locks eliminat...
>
> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision:
>
> Added asserts for BoxLock states change
New changes:
- moved test to `compiler/locks` directory and renamed it because there was test there with the same name
- introduced states for BoxLock node (and corresponding locking region)
- added assert to verify that state change is allowed
- added `BoxLockNode::Identity()` to correctly set state when node is "value numbered" by IGVN when `EliminateNestedLocks` flag is off. When the flag is off C2 assumes that we have one `BoxLock` node per monitor's stack slot - we can't have merges (Phis). So I can't modify `BoxLockNode::hash()` and `cmp()`. I decide to add `Identity()` to handle that. Note, when the flag is off we have only `Regular, Coarsened and Unbalanced` states (`Local` can be set during Macro nodes elimination phase just before it is marked as `Eliminated` so it will be not visible outside small scope of code).
- I also tested OSR case (set its BoxLock to `Unbalanced` state). I will use it for [#17331](https://github.com/openjdk/jdk/pull/17331) after I push these changes.
- I ran tier1-4,xcomp,stress (and now running tier5-7) - no new failures. I also ran performance testing which shows no significant changes.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17697#issuecomment-1962813965
More information about the hotspot-compiler-dev
mailing list