RFR: 8319901: Recursive lightweight locking: ppc64le implementation

Martin Doerr mdoerr at openjdk.org
Tue Feb 20 18:19:18 UTC 2024


On Thu, 15 Feb 2024 13:03:37 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> If you want to go over and find a unified style here which fits better with how the PPC port usually does it, it would be appreciated.

Sure, I'll go over it when I find more time. I'd like to get it correct first, so I can run more tests before thinking about improvements.
I'm still a bit surprised that you invest into PPC64 :-) I appreciate it!
Are you planning to do more for Lilliput on PPC64? I'm also planning to look into the remaining parts when I find time.

>> src/hotspot/cpu/ppc/ppc.ad line 12189:
>> 
>>> 12187:   ins_encode %{
>>> 12188:     __ fast_lock_lightweight($crx$$CondRegister, $oop$$Register, $box$$Register,
>>> 12189:                              $tmp1$$Register, $tmp2$$Register, /*tmp3*/ R0);
>> 
>> You could avoid duplicating the nodes and distinguish the locking modes here if the register usage is the same.
>
> Wanted to keep it separate all the way up. It makes changing the regiester allocations easier without having the different implementations requirements affect each other. If there is a noticeable runtime downside to having multiple nodes matching on different predicates we should revert this. 
> 
> Unsure what C2's assumptions about the state of the box is. But the idea is to eventually remove it or just kill it and use it as a temp register. (Thus allocating one less register for the node)

That's ok for now. We can clean everything up when removing the legacy locking. Unfortunately, `BoxLockNode` is not easy to remove completely because they are used for analyzing properties of the graph. That would need a redesign. But at least, we could implement them as empty nodes such that they don't emit any code.

>> src/hotspot/cpu/ppc/sharedRuntime_ppc.cpp line 2478:
>> 
>>> 2476:       __ compiler_fast_lock_object(CCR0, r_oop, r_box, r_temp_1, r_temp_2, r_temp_3);
>>> 2477:       __ beq(CCR0, locked);
>>> 2478:     }
>> 
>> I appreciate your effort to make it more similar among all platforms. However, I think that using the C2 version is better because the native wrapper is the highest tier. Can we keep it as before with this PR and use a separate issue to clean it up for all platforms?
>> That will allow us to check performance impact individually. Mixing 2 performance related changes is not ideal.
>
> I changed it so that the native wrapper also uses the same fast lock/unlock as the C2 code. 
> Added the assert code which checks that lock stack does not contain the object when handling the inflated case. Would be nice to run some stress tests and validate the assumptions made by the fast lock/unlock.

Thanks! I'll take a look and run tests. The assertion makes sense anyway.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16611#issuecomment-1946511790
PR Review Comment: https://git.openjdk.org/jdk/pull/16611#discussion_r1394182643
PR Review Comment: https://git.openjdk.org/jdk/pull/16611#discussion_r1394183901


More information about the hotspot-dev mailing list