RFR: 8319901: Recursive lightweight locking: ppc64le implementation
Axel Boldt-Christmas
aboldtch at openjdk.org
Tue Feb 20 18:19:18 UTC 2024
On Fri, 10 Nov 2023 21:37:49 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:
> guess you have created new functions because the old ones (for legacy locking) are supposed to get removed at some point of time?
Yes. Also the reduce the clutter. It became overly restrictive to have one place handle all locking modes. (x86 was the worst which has {32bit, 64bt} x {LM_MONITOR, LM_LEGACY, LM_LIGHTWEIGHT} x {RTM_OFF, RMT_ON, RTM_ON + RMT_stack}).
> need to think about removing the monitor locking from the native wrapper. That may cause a performance regression if the locks are contended. Is there a specific reason to change that or just to make it more similar to some other platforms? Note that s390 does it like PPC64.
I thought about this as I was motivating what I did in the draft. And I believe that it would be correct to also use the C2 implementation for the native wrapper. The difference between the C2 and the shared code is wether it can handle unstructured (monitorenter A, monitorenter B, monitorexit A, monitorexit B) type of scenarios. As the native wrapper always emits a structured monitorenter, monitorexit pair and any synchronisation in the native must either be inflated (JNI synchronisation) or balanced (java upcalls), there should always be the case that fast_unlock does not see the object on the top of its lock stack.
Have to think about this some more, but it might be that all the other platforms should adopt this approach as well.
Also not that this is about supporting inflated monitors without going into the runtime. Not for handle contended locking, for contended locking the current ppc implementation always go into the runtime.
AFAICT only x86_64 does anything when it comes to contended locking. Where it tries to handoff the lock without going into the runtime.
> Your update doesn't solve all the problems and I'd prefer to keep the `box` free in case we need it again. I suggest using my patch instead. If you want to minimize register reusage, you can change `ldx(t, t, R16_thread);` to `ldx(t, R16_thread, t);` which is equivalent and can be used with R0. What I do like is not to pass R0 explicitly any more.
Right, I missed the `MacroAssembler::lightweight_lock` part of the diff.
I'll revert and go with yours for now. It does start to mix styles. Also given that R0 behaves differently than other registers, maybe it should not be aliased. (Or made more clear that it alias R0)
If you want to go over and find a unified style here which fits better with how the PPC port usually does it, it would be appreciated.
> You forgot to turn the ldx operands in `MacroAssembler::lightweight_lock`. I suggest making them all consistent (regardless if they use R0 or not). I believe that is the preferred form by the ISA, anyway.
The lock changes were already pushed in an earlier commit (60fc40f8ca53be5f12700f33816419142e7faaa9). Unified the unlock `ldx` as well.
> Are you planning to do more for Lilliput on PPC64?
I am working on creating a actionable sub-task list of work items for Locking and ObjectMonitors in Lilliput. Extending supports to other ports will be on this list. It will not be the main priority I will be working on, however for ports which already have a recursive lightweight locking implementation, only the ObjectMonitor cache needs to be implemented, which is a fairly simple adaption. I will get to it if time permits and no one gets to it first. So short answer maybe.
> src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 79:
>
>> 77: bgt(flag, slow_path);
>> 78:
>> 79: // Check if recursive.
>
> Would be good to mention the `_bad_oop_sentinel` in the comment.
Done.
> src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 140:
>
>> 138: inc_held_monitor_count(t);
>> 139:
>> 140: #ifdef ASSERT
>
> Whitespaces.
Done.
> src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 202:
>
>> 200: bne(CCR0, inflated);
>> 201:
>> 202: #ifdef ASSERT
>
> Whitespaces.
Done.
> src/hotspot/cpu/ppc/ppc.ad line 12189:
>
>> 12187: ins_encode %{
>> 12188: __ fast_lock_lightweight($crx$$CondRegister, $oop$$Register, $box$$Register,
>> 12189: $tmp1$$Register, $tmp2$$Register, /*tmp3*/ R0);
>
> You could avoid duplicating the nodes and distinguish the locking modes here if the register usage is the same.
Wanted to keep it separate all the way up. It makes changing the regiester allocations easier without having the different implementations requirements affect each other. If there is a noticeable runtime downside to having multiple nodes matching on different predicates we should revert this.
Unsure what C2's assumptions about the state of the box is. But the idea is to eventually remove it or just kill it and use it as a temp register. (Thus allocating one less register for the node)
> src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 2478:
>
>> 2476: __ compiler_fast_lock_object(CCR0, r_oop, r_box, r_temp_1, r_temp_2, r_temp_3);
>> 2477: __ beq(CCR0, locked);
>> 2478: }
>
> I appreciate your effort to make it more similar among all platforms. However, I think that using the C2 version is better because the native wrapper is the highest tier. Can we keep it as before with this PR and use a separate issue to clean it up for all platforms?
> That will allow us to check performance impact individually. Mixing 2 performance related changes is not ideal.
I changed it so that the native wrapper also uses the same fast lock/unlock as the C2 code.
Added the assert code which checks that lock stack does not contain the object when handling the inflated case. Would be nice to run some stress tests and validate the assumptions made by the fast lock/unlock.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16611#issuecomment-1807632775
PR Comment: https://git.openjdk.org/jdk/pull/16611#issuecomment-1946058087
PR Comment: https://git.openjdk.org/jdk/pull/16611#issuecomment-1947834459
PR Review Comment: https://git.openjdk.org/jdk/pull/16611#discussion_r1394144690
PR Review Comment: https://git.openjdk.org/jdk/pull/16611#discussion_r1394144784
PR Review Comment: https://git.openjdk.org/jdk/pull/16611#discussion_r1394144851
PR Review Comment: https://git.openjdk.org/jdk/pull/16611#discussion_r1394152850
PR Review Comment: https://git.openjdk.org/jdk/pull/16611#discussion_r1394147579
More information about the hotspot-dev
mailing list