8267042: bug in monitor locking/unlocking on ARM32 C1 due to uninitialized BasicObjectLock::_displaced_header

Chris Cole chris at sageembedded.com
Thu May 20 17:13:53 UTC 2021


Hi Boris,

Thanks for looking into this and your support. See comments below.

On Wed, May 19, 2021 at 2:20 AM Boris Ulasevich
<boris.ulasevich at bell-sw.com> wrote:
>
> Hi Chris,
>
> I agree that the problem is in C1_MacroAssembler::lock_object.

I believe that this issue can be fixed by initializing the displaced
header in either
C1_MacroAssembler::lock_object/SharedRuntime::generate_native_wrapper
or in ObjectSynchronizer::quick_enter with changes provided in
previous email.

I would suggest that where the problem/fix is located depends on what
the "interface contract" is for initializing the displaced header
between JIT inlined code and the monitor enter runtime helper. This
"interface contract" is not documented anywhere so it is defined by
how things are coded. Before the introduction of
ObjectSynchronizer::quick_enter(), the runtime helper would always
initialize the displaced header as appropriate. When
ObjectSynchronizer::quick_enter() was introduced it changed the
"interface contract" of the monitor enter runtime helper because there
is one special case where it no longer always initializes the
displaced header as appropriate.

I believe that the better design and "interface contract" is to have
the runtime helper always set the displaced header as appropriate
(apply change to ObjectSynchronizer::quick_enter and no other changes)
for the following reasons.
- Less coupling between JIT code and runtime helper, this coupling
helped lead to a number of time consuming bugs including 8153107,
8077392, and this one.
- Every place that slow case runtime helper is called, special care is
required to initialize displaced header, rather then in just one place
in ObjectSynchronizer::quick_enter.
- This coupling is also exposed through the
JVMCIRuntime::monitorenter(JavaThread, oopDesc, BasicLock) interface
and without updates to ObjectSynchronizer::quick_enter(), anyone (like
Graal, etc) calling this interface needs to make sure
BasicLock->_displaced_header is initialized to a non-zero value. My
guess is that without removing this requirement there may be future
deadlock issues. Also current users use JVMCIRuntime::monitorenter()
should be reviewed to ensure that they are initializing
BasicLock->_displaced_header.
- JIT inline code and JVMCIRuntime::monitorenter users can possibly be
smaller and faster, if the JIT inline code determines that the runtime
helper is needed it can simply call this without first having to
initialize the displaced header. Also note that JIT code may not have
the information to know exactly what to set displaced herder to, so I
will set it to non-zero and the runtime helper may overwrite this
value with something different based on the additional logic in the
slow path.
- Supports fixing -XX:-UseFastLocking develop flag without additional changes

But given that all code except ARM32 C1, assumes an "interface
contract" that has special case for when displaced header needs to be
initialized I am not sure that there will be much support for making
changes that impact all other architectures in the context of the
ARM32 bug fix (for other architectures there could be a redundant
setting displaced header to non-zero in
ObjectSynchronizer::quick_enter()).

Anyway, just my thoughts... I am fine with either location
(C1_MacroAssembler::lock_object/SharedRuntime::generate_native_wrapper
or ObjectSynchronizer::quick_enter) to fix this ARM32 bug.

> What I do not like in your fix is
> - an arbitrary non-zero value is put into disp_hdr address for 'ne' case.

I somewhat agree that in general the aesthetics of using an arbitrary
non-zero value isn't ideal. But in this case I believe setting to a
specific non-zero value would require an additional instruction and I
don't think this is worth it. Note that this JIT code is inlined to
every compiled synchronized method and block, so that one extra
instruction gets multiplied by a large number and increases memory
usage, especially important for ARM32 with devices generally having
limited memory.

The code change for C1_MacroAssembler::lock_object is taken from what
is done for ARM32 C2 code in C2_MacroAssembler::fast_lock(),
that is using an arbitrary non-zero value (see
https://github.com/openjdk/jdk/blob/b7b6acd9b1cafb791827e151712836c4e7140db5/src/hotspot/cpu/arm/c2_MacroAssembler_arm.cpp#L122).
I would suggest that all places should be consistent. Also when other
architectures call the runtime helper, the displaced header is
initialize with "unlocked object header" value due to failed attempt
to do a stack lock and this is somewhat arbitrary (and somewhat
misleading) (see
https://github.com/openjdk/jdk/blob/dfe8833f5d9a9ac59857143a86d07f85769b8eae/src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp#L74).

Note that in this context the displaced header is acting like a
boolean, false = 0 and any other value is true.

So to keep the code small and to be consistent with ARM32 C2, I would
suggest just using non-zero value. Maybe a better comment in the code
would be helpful, as follows.

diff --git a/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
b/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
index 9d5c82dceb9..44ddc3a9da9 100644
--- a/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
+++ b/src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp
@@ -235,7 +235,11 @@ int C1_MacroAssembler::lock_object(Register hdr,
Register obj,
   sub(tmp2, hdr, SP, eq);
   movs(tmp2, AsmOperand(tmp2, lsr, exact_log2(os::vm_page_size())), eq);
   // If 'eq' then OK for recursive fast locking: store 0 into a lock record.
-  str(tmp2, Address(disp_hdr, mark_offset), eq);
+  // If 'ne' then initialize displaced header with this non-zero value
+  // to make sure monitor exit is not treated as non-inflated recursive unlock,
+  // the runtime helper used in slow case doesn't always do this for us
+  // (see discussion in JDK-8267042)
+  str(tmp2, Address(disp_hdr, mark_offset));
   b(fast_lock_done, eq);
   // else need slow case
   b(slow_case);


> - there is a similar code pattern in
> SharedRuntime::generate_native_wrapper - should not it be fixed too?

Good catch, yes this needs to be fixed as well. I would suggest same
change as for C1_MacroAssembler::lock_object, as follows.

diff --git a/src/hotspot/cpu/arm/sharedRuntime_arm.cpp
b/src/hotspot/cpu/arm/sharedRuntime_arm.cpp
index 341cf63c4c9..7de4306aedb 100644
--- a/src/hotspot/cpu/arm/sharedRuntime_arm.cpp
+++ b/src/hotspot/cpu/arm/sharedRuntime_arm.cpp
@@ -1184,7 +1184,11 @@ nmethod*
SharedRuntime::generate_native_wrapper(MacroAssembler* masm,
     __ sub(Rtemp, mark, SP, eq);
     __ movs(Rtemp, AsmOperand(Rtemp, lsr, exact_log2(os::vm_page_size())), eq);
     // If still 'eq' then recursive locking OK: set displaced header to 0
-    __ str(Rtemp, Address(disp_hdr,
BasicLock::displaced_header_offset_in_bytes()), eq);
+    // If 'ne' then initialize displaced header with this non-zero value
+    // to make sure monitor exit is not treated as non-inflated
recursive unlock,
+    // the runtime helper used in slow case doesn't always do this for us
+    // (see discussion in JDK-8267042)
+    __ str(Rtemp, Address(disp_hdr,
BasicLock::displaced_header_offset_in_bytes()));
     __ b(lock_done, eq);
     __ b(slow_lock);

> - the second comment in hdr bits manipulation code is wrong: "// -2-
> test (hdr - SP) if the low two bits are 0"

What are you suggesting is wrong? If I read this as a comment about
the two instructions that follow (sub and movs) then seems to make
sense to me, but I might be missing something. Also I would think this
is not really related to this issue and addressed in a separate place
issue?

Thanks again,
Chris

>
> regards,
> Boris
>


More information about the hotspot-dev mailing list