RFR: 8319801: Recursive lightweight locking: aarch64 implementation [v6]

Roman Kennke rkennke at openjdk.org
Wed Jan 24 14:57:28 UTC 2024


On Wed, 24 Jan 2024 14:30:46 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp line 81:
>> 
>>> 79:     tstw(hdr, JVM_ACC_IS_VALUE_BASED_CLASS);
>>> 80:     br(Assembler::NE, slow_case);
>>> 81:   } else if (LockingMode == LM_LIGHTWEIGHT) {
>> 
>> What is the advantage of moving the load of the header around? The way you did it, it is less obvious that for LW the header is loaded in the block before locking is actually done.
>
> The current implementation requires the first emitted instruction of `C1_MacroAssembler::lock_object` to trap if the obj is null.  We do it here with a load. 
> 
> I could instrument `MacroAssembler::lightweight_lock` with a `bool preload_mark` so that C1 does not do the redundant load, while still emitting its first trapping instruction, and not create redundant loads for the interpreter and native wrappers.
> 
> I think that we want to move all platforms to what PPC is doing, mainly that C2, C1 and native wrapper shares a common lock/unlock implementation (as they lock + unlock under the same balanced assumptions), while have `MacroAssembler::lightweight_lock` solely dedicated to the interpreter. 
> 
> I think I will update `MacroAssembler::lightweight_lock` to have this `preload_mark` bool. You are not the first to ask about this specific line, and it might make it more clear. With a comment about the C1 implicit null check.

Ok I understand, thank you! I actually like it more the way you did it before (without the bool preload_mark flag). An extra comment there may be useful.
Or maybe consider to load the mark in MacroAssembler as first instruction unconditionally? As far as I can tell mark-word is not used in the lock-stack-full case (very uncommon) and recursive (more common), but it's only really relevant for interpreter and shared-runtime (C2 has its own impl, C1 needs to have the load there anyway), where the extra cycle don't really matter anyway? Would keep the code cleaner. But it means there is an implicit behavioural dependency between the locking impl and the C1 impl, which might be problematic.
But let's not add bools and different code-paths, this is even more confusing.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16608#discussion_r1465036011


More information about the hotspot-dev mailing list