RFR: 8319947: Recursive lightweight locking: s390x implementation
Axel Boldt-Christmas
aboldtch at openjdk.org
Tue May 21 12:14:04 UTC 2024
On Sun, 21 Apr 2024 16:30:43 GMT, Amit Kumar <amitkumar at openjdk.org> wrote:
> s390x port for recursive locking.
>
> testing:
> - [x] build fastdebug-vm
> - [x] build slowdebug-vm
> - [x] build release-vm
> - [x] build optimized-vm
> - [x] ./test/jdk/java/util/concurrent (fastdebug-vm)
> - [x] with C1
> - [x] with C2
> - [x] with interpreter
> - [x] ./test/jdk/java/util/concurrent (release-vm)
> - [x] with C1
> - [x] with C2
> - [x] with interpreter
> - [x] ./test/jdk/java/util/concurrent (slowdebug-vm)
> - [x] with C1
> - [x] with C2
> - [x] with interpreter
> - [x] tier1 with fastdebug-vm
> - [x] tier1 with slowdebug-vm
> - [x] tier1 with release-vm
>
> *BenchMarks*:
>
> Results from Performance LPARs :
>
>
> Locking Mode = 1 (without Patch)
>
> Benchmark (innerCount) Mode Cnt Score Error Units
> LockUnlock.testContendedLock 100 avgt 12 5.144 ± 0.035 ns/op
> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3824.742 ± 89.475 ns/op
> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.348 ± 0.559 ns/op
> LockUnlock.testSerialLockUnlock 100 avgt 12 466.629 ± 3.036 ns/op
> LockUnlock.testSimpleLockUnlock 100 avgt 12 468.532 ± 1.793 ns/op
> Finished running test 'micro:vm.lang.LockUnlock'
>
> Locking Mode = 1 (with patch)
>
> Benchmark (innerCount) Mode Cnt Score Error Units
> LockUnlock.testContendedLock 100 avgt 12 5.146 ± 0.027 ns/op
> LockUnlock.testRecursiveLockUnlock 100 avgt 12 3833.175 ± 75.863 ns/op
> LockUnlock.testRecursiveSynchronization 100 avgt 12 25.206 ± 0.519 ns/op
> LockUnlock.testSerialLockUnlock 100 avgt 12 473.973 ± 2.103 ns/op
> LockUnlock.testSimpleLockUnlock 100 avgt 12 470.749 ± 2.229 ns/op
> Finished running test 'micro:vm.lang.LockUnlock'
>
>
>
>
> Locking Mode = 2 (without Patch)
>
> Benchmark (innerCount) Mode Cnt Score Error Units
> LockUnlock.testContendedLock 100 avgt 12 4.688 ± 0.051 ns/op
> LockUnlock.testRecursiveLockUnlock 100 avgt 12 12800.544 ± 92.265 ns/op
> LockUnlock.testRecursiveSynchronization 100 avgt 12 26.486 ± 2.229 ns/op
> LockUnlock.testSerialLockUnlock 100 avgt 12 424.499 ± 0.416 ns/op
> LockUnlock.testSimpleLockUnlock 100 avgt 12 424.241 ± 0.840 ns/op
> Finished running test 'micro:vm.lang.Lo...
This review only looks at the soundness of the algorithm and not that the actual implementation is correct with regards to the specific instructions, register widths etc.
Looks good. Had a handful comments / questions.
src/hotspot/cpu/s390/interp_masm_s390.cpp line 1013:
> 1011: assert((JVM_ACC_IS_VALUE_BASED_CLASS & 0xFFFF) == 0, "or change following instruction");
> 1012: z_nilh(tmp, JVM_ACC_IS_VALUE_BASED_CLASS >> 16);
> 1013: z_brne(slow_case);
Is this change unrelated to recursive lightweight? If so should it be a separate RFE?
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5757:
> 5755:
> 5756: z_csg(mark, top, oopDesc::mark_offset_in_bytes(), obj);
> 5757: branch_optimized(Assembler::bcondNotEqual, slow);
The previous comment and spacing makes this a little harder for me to read and understand.
```C++
// Try to lock. Transition lock-bits 0b01 => 0b00
z_oill(mark, markWord::unlocked_value);
z_lgr(top, mark);
z_xilf(top, markWord::unlocked_value);
z_csg(mark, top, oopDesc::mark_offset_in_bytes(), obj);
branch_optimized(Assembler::bcondNotEqual, slow);
If comments to clarify the `mark |= 1; top = mark; top ^= 1;` logic requires a comment, it can be added but without the newlines.
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5767:
> 5765:
> 5766: // as locking was successful, set CC to EQ
> 5767: z_cr(top, top); // z_ahi instruction above can change the cc, so we need this
Given that `lightweight_lock` is now only used from the interpreter and c1 the CC flag should not matter?
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5856:
> 5854: branch_optimized(bcondAlways, slow);
> 5855:
> 5856: bind(unlocked); // CC is already set to EQ, if we jumped here
Same as with `lightweight_lock`
Given that `lightweight_unlock` is now only used from the interpreter and c1 the CC flag should not matter?
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 5906:
> 5904: // not inflated
> 5905:
> 5906: // Try to lock. Transition lock bits 0b00 => 0b01
Flipped the bits in the comment.
Suggestion:
// Try to lock. Transition lock bits 0b01 => 0b00
src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6015:
> 6013: // we will encounter a loop while handling the inflated monitor case
> 6014: // so, we need to make sure, when we reach there only top one object is removed.
> 6015: // if we load top there then it could result into infinite loop, So preserving top is a Must here;
I assume this refers to the assert/debug code that checks that the lock stack does not contain the object when doing inflated unlocking. The debug code could just unconditionally reload top from the thread.
-------------
Changes requested by aboldtch (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/18878#pullrequestreview-2067976856
PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608176525
PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608194021
PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1607913366
PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1607924012
PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608214358
PR Review Comment: https://git.openjdk.org/jdk/pull/18878#discussion_r1608167611
More information about the hotspot-dev
mailing list