8267042: bug in monitor locking/unlocking on ARM32 C1 due to uninitialized BasicObjectLock::_displaced_header
Boris Ulasevich
boris.ulasevich at bell-sw.com
Sun May 23 20:37:30 UTC 2021
Hi Chris,
I checked that the fix works fine and does not bring any regression.
Let me create pull request and sponsor the change.
Here is the fix and reproducer I prepared for review:
https://github.com/openjdk/jdk/compare/master...bulasevich:jdk-8267042-arm32-c1-lock-01
(I think it is worth simplifying the MonitorBugTest)
regards,
Boris
On 20.05.2021 20:13, Chris Cole wrote:
> Hi Boris,
>
> Thanks for looking into this and your support. See comments below.
>
> On Wed, May 19, 2021 at 2:20 AM Boris Ulasevich
> <boris.ulasevich at bell-sw.com> wrote:
>> Hi Chris,
>>
>> I agree that the problem is in C1_MacroAssembler::lock_object.
> I believe that this issue can be fixed by initializing the displaced
> header in either
> C1_MacroAssembler::lock_object/SharedRuntime::generate_native_wrapper
> or in ObjectSynchronizer::quick_enter with changes provided in
> previous email.
>
> I would suggest that where the problem/fix is located depends on what
> the "interface contract" is for initializing the displaced header
> between JIT inlined code and the monitor enter runtime helper. This
> "interface contract" is not documented anywhere so it is defined by
> how things are coded. Before the introduction of
> ObjectSynchronizer::quick_enter(), the runtime helper would always
> initialize the displaced header as appropriate. When
> ObjectSynchronizer::quick_enter() was introduced it changed the
> "interface contract" of the monitor enter runtime helper because there
> is one special case where it no longer always initializes the
> displaced header as appropriate.
>
> I believe that the better design and "interface contract" is to have
> the runtime helper always set the displaced header as appropriate
> (apply change to ObjectSynchronizer::quick_enter and no other changes)
> for the following reasons.
> - Less coupling between JIT code and runtime helper, this coupling
> helped lead to a number of time consuming bugs including 8153107,
> 8077392, and this one.
> - Every place that slow case runtime helper is called, special care is
> required to initialize displaced header, rather then in just one place
> in ObjectSynchronizer::quick_enter.
> - This coupling is also exposed through the
> JVMCIRuntime::monitorenter(JavaThread, oopDesc, BasicLock) interface
> and without updates to ObjectSynchronizer::quick_enter(), anyone (like
> Graal, etc) calling this interface needs to make sure
> BasicLock->_displaced_header is initialized to a non-zero value. My
> guess is that without removing this requirement there may be future
> deadlock issues. Also current users use JVMCIRuntime::monitorenter()
> should be reviewed to ensure that they are initializing
> BasicLock->_displaced_header.
> - JIT inline code and JVMCIRuntime::monitorenter users can possibly be
> smaller and faster, if the JIT inline code determines that the runtime
> helper is needed it can simply call this without first having to
> initialize the displaced header. Also note that JIT code may not have
> the information to know exactly what to set displaced herder to, so I
> will set it to non-zero and the runtime helper may overwrite this
> value with something different based on the additional logic in the
> slow path.
> - Supports fixing -XX:-UseFastLocking develop flag without additional changes
>
> But given that all code except ARM32 C1, assumes an "interface
> contract" that has special case for when displaced header needs to be
> initialized I am not sure that there will be much support for making
> changes that impact all other architectures in the context of the
> ARM32 bug fix (for other architectures there could be a redundant
> setting displaced header to non-zero in
> ObjectSynchronizer::quick_enter()).
>
> Anyway, just my thoughts... I am fine with either location
> (C1_MacroAssembler::lock_object/SharedRuntime::generate_native_wrapper
> or ObjectSynchronizer::quick_enter) to fix this ARM32 bug.
>
>> What I do not like in your fix is
>> - an arbitrary non-zero value is put into disp_hdr address for 'ne' case.
> I somewhat agree that in general the aesthetics of using an arbitrary
> non-zero value isn't ideal. But in this case I believe setting to a
> specific non-zero value would require an additional instruction and I
> don't think this is worth it. Note that this JIT code is inlined to
> every compiled synchronized method and block, so that one extra
> instruction gets multiplied by a large number and increases memory
> usage, especially important for ARM32 with devices generally having
> limited memory.
Sure. And this is how C2 MacroAssembler lock_object implementation
works! I am Ok with the fix.
> The code change for C1_MacroAssembler::lock_object is taken from what
> is done for ARM32 C2 code in C2_MacroAssembler::fast_lock(),
> that is using an arbitrary non-zero value (see
> https://github.com/openjdk/jdk/blob/b7b6acd9b1cafb791827e151712836c4e7140db5/src/hotspot/cpu/arm/c2_MacroAssembler_arm.cpp#L122).
> I would suggest that all places should be consistent. Also when other
> architectures call the runtime helper, the displaced header is
> initialize with "unlocked object header" value due to failed attempt
> to do a stack lock and this is somewhat arbitrary (and somewhat
> misleading) (see
> https://github.com/openjdk/jdk/blob/dfe8833f5d9a9ac59857143a86d07f85769b8eae/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp#L74).
>
> Note that in this context the displaced header is acting like a
> boolean, false = 0 and any other value is true.
>
> So to keep the code small and to be consistent with ARM32 C2, I would
> suggest just using non-zero value. Maybe a better comment in the code
> would be helpful, as follows.
>
> diff --git a/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
> b/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
> index 9d5c82dceb9..44ddc3a9da9 100644
> --- a/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
> +++ b/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
> @@ -235,7 +235,11 @@ int C1_MacroAssembler::lock_object(Register hdr,
> Register obj,
> sub(tmp2, hdr, SP, eq);
> movs(tmp2, AsmOperand(tmp2, lsr, exact_log2(os::vm_page_size())), eq);
> // If 'eq' then OK for recursive fast locking: store 0 into a lock record.
> - str(tmp2, Address(disp_hdr, mark_offset), eq);
> + // If 'ne' then initialize displaced header with this non-zero value
> + // to make sure monitor exit is not treated as non-inflated recursive unlock,
> + // the runtime helper used in slow case doesn't always do this for us
> + // (see discussion in JDK-8267042)
> + str(tmp2, Address(disp_hdr, mark_offset));
> b(fast_lock_done, eq);
> // else need slow case
> b(slow_case);
>
>
>> - there is a similar code pattern in
>> SharedRuntime::generate_native_wrapper - should not it be fixed too?
> Good catch, yes this needs to be fixed as well. I would suggest same
> change as for C1_MacroAssembler::lock_object, as follows.
>
> diff --git a/src/hotspot/cpu/arm/sharedRuntime_arm.cpp
> b/src/hotspot/cpu/arm/sharedRuntime_arm.cpp
> index 341cf63c4c9..7de4306aedb 100644
> --- a/src/hotspot/cpu/arm/sharedRuntime_arm.cpp
> +++ b/src/hotspot/cpu/arm/sharedRuntime_arm.cpp
> @@ -1184,7 +1184,11 @@ nmethod*
> SharedRuntime::generate_native_wrapper(MacroAssembler* masm,
> __ sub(Rtemp, mark, SP, eq);
> __ movs(Rtemp, AsmOperand(Rtemp, lsr, exact_log2(os::vm_page_size())), eq);
> // If still 'eq' then recursive locking OK: set displaced header to 0
> - __ str(Rtemp, Address(disp_hdr,
> BasicLock::displaced_header_offset_in_bytes()), eq);
> + // If 'ne' then initialize displaced header with this non-zero value
> + // to make sure monitor exit is not treated as non-inflated
> recursive unlock,
> + // the runtime helper used in slow case doesn't always do this for us
> + // (see discussion in JDK-8267042)
> + __ str(Rtemp, Address(disp_hdr,
> BasicLock::displaced_header_offset_in_bytes()));
> __ b(lock_done, eq);
> __ b(slow_lock);
>
>> - the second comment in hdr bits manipulation code is wrong: "// -2-
>> test (hdr - SP) if the low two bits are 0"
> What are you suggesting is wrong? If I read this as a comment about
> the two instructions that follow (sub and movs) then seems to make
> sense to me, but I might be missing something. Also I would think this
> is not really related to this issue and addressed in a separate place
> issue?
The two instructions compare upper bits (not lower two) to ensure the SP
and
hdr are on the same page. Though, I agree, it is not related to this issue.
// -2- test (hdr - SP) if the low two bits are 0
sub(tmp2, hdr, SP, eq);
movs(tmp2, AsmOperand(tmp2, lsr, exact_log2(os::vm_page_size())), eq);
> Thanks again,
> Chris
>
>> regards,
>> Boris
>>
More information about the hotspot-dev
mailing list