RFR: 8319901: Recursive lightweight locking: ppc64le implementation

Axel Boldt-Christmas aboldtch at openjdk.org
Tue Feb 20 18:19:18 UTC 2024


On Fri, 10 Nov 2023 13:21:58 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

> Draft implementation of recursive lightweight JDK-8319796 for ppc64le.
> 
> Porters feel free to discard this and implement something from scratch, take this initial draft and run with it, or send patches to me and if you want me to drive the PR.
> 
> Some notes:
> ppc64le deviates from the other ports, it shares it C2 locking code with the native call wrapper. This draft unifies the behavior across the ports by using the C1/interpreter locking for native call wrapper. If it is important to have a fast path for the inflated monitor case for native call wrapper, it could be done without having it share its implementation with C2. 
> It would also be possible to change this behavior on all ports (share the C2 code), as I believe the C2 assumptions always hold for the native call wrapper, the monitor exit and monitor enter will be balanced and structured.

> Thanks for rebasing it! We noticed that the register usage is invalid at some points. Using R0 doesn't work for memory base addresses or as source operand for addi instructions. It should work like this:
> 
> ```diff
> diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
> index 6a7428ebd04..6f131467595 100644
> --- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
> +++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp
> @@ -2471,7 +2471,7 @@ void MacroAssembler::compiler_fast_lock_lightweight_object(ConditionRegister fla
>    }
>  
>    const Register mark = tmp1;
> -  const Register t = tmp3;
> +  const Register t = tmp3; // Usage of R0 allowed!
>  
>    { // Lightweight locking
>  
> @@ -2489,8 +2489,8 @@ void MacroAssembler::compiler_fast_lock_lightweight_object(ConditionRegister fla
>      // when the lock stack is empty because of the _bad_oop_sentinel field.
>  
>      // Check if recursive.
> -    subi(t, top, oopSize);
> -    ldx(t, t, R16_thread);
> +    subi(tmp1, top, oopSize);
> +    ldx(t, tmp1, R16_thread);
>      cmpd(flag, obj, t);
>      beq(flag, push);
>  
> @@ -2541,13 +2541,13 @@ void MacroAssembler::compiler_fast_lock_lightweight_object(ConditionRegister fla
>      bne(flag, slow_path);
>  
>      // Recursive.
> -    ld(t, in_bytes(ObjectMonitor::recursions_offset() - ObjectMonitor::owner_offset()), owner_addr);
> -    addi(t, t, 1);
> -    std(t, in_bytes(ObjectMonitor::recursions_offset() - ObjectMonitor::owner_offset()), owner_addr);
> +    ld(tmp1, in_bytes(ObjectMonitor::recursions_offset() - ObjectMonitor::owner_offset()), owner_addr);
> +    addi(tmp1, tmp1, 1);
> +    std(tmp1, in_bytes(ObjectMonitor::recursions_offset() - ObjectMonitor::owner_offset()), owner_addr);
>    }
>  
>    bind(locked);
> -  inc_held_monitor_count(t);
> +  inc_held_monitor_count(tmp1);
>  
>  #ifdef ASSERT
>    // Check that locked label is reached with flags == EQ.
> @@ -4361,8 +4361,8 @@ void MacroAssembler::lightweight_lock(Register obj, Register t1, Register t2, La
>    // when the lock stack is empty because of the _bad_oop_sentinel field.
>  
>    // Check for recursion.
> -  subi(t, top, oopSize);
> -  ldx(t, t, R16_thread);
> +  subi(t2, top, oopSize);
> +  ldx(t, t2, R16_thread);
>    cmpd(CCR0, obj, t);
>    beq(CCR0, push);
>  
> ```
> 
> We may want to do further improvements to make register usage less confusing. Please take a look!

I was not aware of the specific semantics of R0. 

This register comes from the expansion of the FastLockNode in the .ad file. I just copied the register allocation from the current implementation. Maybe this should be hoisted, such that none of the temp registers are allowed to be R0. None is when called from the shared runtime, this only occurs from the .ad file.

The .ad file allocates a register for the box which is just dropped. We could use this instead as tmp3.

I pushed a change along those lines.

There are still some room for cleanups here with regards to the register allocation.

I will keep this in mind for Lilliput as well. There we still want to use the box, so will have to do some register juggling, or change the register allocation.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16611#issuecomment-1945546958


More information about the hotspot-dev mailing list