RFR: 8319799: Recursive lightweight locking: x86 implementation [v2]
Roman Kennke
rkennke at openjdk.org
Mon Nov 13 15:53:00 UTC 2023
On Mon, 13 Nov 2023 10:45:10 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:
>> Implements the x86 port of JDK-8319796.
>>
>> There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper.
>>
>> The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully.
>>
>> Only if the recursive lightweight [un]lock fails does it look at the mark word.
>>
>> For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime.
>>
>> The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided.
>>
>> First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime.
>>
>> The x86 C2 port also has some extra oddities.
>>
>> The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware.
>>
>> The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime.
>>
>> The contended unlock was also moved to the code stub.
>
> Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision:
>
> - Fix type
> - Move inflated check in fast_locked
> - Move top load
I see benefits in interleaving the various loads in the locking fast-paths.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 974:
> 972: jcc(Assembler::notZero, inflated);
> 973:
> 974: // Load top.
I have found it to be beneficial to move up the load of the top-offset to between the load/prefetch of the mark-word and the test for monitor. This way we do the test while the top-offset arrives and reduce the latency of the lock-stack-full-check.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1072:
> 1070:
> 1071: // Check if obj is top of lock-stack.
> 1072: movl(top, Address(thread, JavaThread::lock_stack_top_offset()));
Like above, moving the load of the top-offset up above ent mark-load should be harmless and potentially reduces the time that the following instructions have to wait for the top-offset to arrive.
-------------
PR Review: https://git.openjdk.org/jdk/pull/16607#pullrequestreview-1727614494
PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1391298977
PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1391301749
More information about the hotspot-dev
mailing list