RFR: 8225351(13): assert failed: Revoke failed, unhandled biased lock state
Robbin Ehn
robbin.ehn at oracle.com
Tue Jun 18 17:38:49 UTC 2019
Hi Dan,
Far down!
On 2019-06-18 17:46, Daniel D. Daugherty wrote:
> Hi Robbin,
>
> I'm still having trouble with this. Please bear with me below...
>
>
> On 6/18/19 3:12 AM, Robbin Ehn wrote:
>> Hi Dan,
>>
>> On 2019-06-17 22:34, Daniel D. Daugherty wrote:
>>> Hi Robbin,
>>>
>>> Okay so I read this email and then the bug report. The conclusion sounds
>>> like we have an assert() that is a little too aggressive so I moved on to
>>> the webrev... I expected to find a refinement to an assert(), but that's
>>> not quite what I see...
>>>
>>> Let me re-read everything again...
>>>
>>> Hmmm... still reads the same, but let me dive into the code instead...
>>>
>>>
>>> On 6/14/19 4:03 AM, Robbin Ehn wrote:
>>>> Hi all, please review.
>>
>> To answer:
>> > I'm having problems with this sanity check. That function expects
>> > to find a BasicLock (via a MonitorInfo) that refers to the
>> > object whose bias we are trying to revoke. That check is assuming
>> > that a biased lock must be in the thread's top frame and I don't
>> > think that is a valid assumption.
>>
>>>>
>>>> When looking at a JavaThreads locks we retrieve them per frame via it's
>>>> monitors
>>>> list. How this list actually populated differs from frame type. If a
>>>> JavaThread tries to enter a new monitor a basic lock is directly/indirectly
>>>> via e.g. scope info added to the virtual monitor list.
>>
>> The lock is taken in this current java frame, the scope desc about the lock is
>> thus added to current frame. When fast path fails we call into to VM to get the
>> lock. This MonitorInfo (generated from the scope desc) points to the "biased to
>> other thread lock" must thus be in top java frame.
>> This is the only case when a biased lock towards another thread will be on our
>> stack, when we have not yet had time to revoke it.
>>
>> I don't want another case falling through the assert, if anything I'm saying
>> would be incorrect.
>>
>> Unfortunate the monitor info is not guaranteed to be in order otherwise this
>> assert would also check this is the last lock in the top frame.
>> (at least interpreter can have them out-of-order, this is compiled frames, but
>> I didn't feel sure about it)
>
> I attached the hs_err_pid file from Mikael's original Mach5 sighting
> to the bug report. Let me get oriented here by looking at the crashing
> stack:
>
> V [jvm.dll+0x50bd22] report_vm_error+0x102 (debug.cpp:264)
> V [jvm.dll+0x2d553e] BiasedLocking::revoke_own_locks_in_handshake+0xfe
> (biasedlocking.cpp:640)
> Deoptimization::revoke_using_handshake() optimized out
> V [jvm.dll+0x51a6f5] Deoptimization::deopt_thread+0x1b5 (deoptimization.cpp:1518)
> V [jvm.dll+0x51a960] Deoptimization::deoptimize+0x50 (deoptimization.cpp:1500)
> V [jvm.dll+0xc7bca2] JavaThread::deoptimize_marked_methods+0xe2 (thread.cpp:2840)
> V [jvm.dll+0x680f89] HandshakeThreadsOperation::do_handshake+0x189
> (handshake.cpp:248)
> V [jvm.dll+0x68144f] VM_HandshakeAllThreads::doit+0x36f (handshake.cpp:193)
> V [jvm.dll+0xcdf18e] VM_Operation::evaluate+0xbe (vmoperations.cpp:71)
> V [jvm.dll+0xce42a8] VMThread::evaluate_operation+0xb8 (vmthread.cpp:406)
> V [jvm.dll+0xce507f] VMThread::loop+0x84f (vmthread.cpp:616)
> V [jvm.dll+0xce5708] VMThread::run+0xd8 (vmthread.cpp:305)
> V [jvm.dll+0xc79232] Thread::call_run+0x192 (thread.cpp:405)
> V [jvm.dll+0xad6f0e] thread_native_entry+0x10e (os_windows.cpp:471)
>
> Okay, so deopt_thread() is called with the 'frame fr' that we are trying
> to deopt and it calls revoke_using_handshake() with the same 'frame fr'.
> revoke_using_handshake() calls get_monitors_from_stack() passing the
> same 'frame fr' and populates 'objects_to_revoke'. We go thru the list
> of objects that need to be revoked and call revoke_own_locks_in_handshake()
> on each one. It is in one of the revoke_own_locks_in_handshake() calls
> that we fail the assertion because the object is biased towards another
> thread.
>
> Okay, so now I'm oriented in the code and see where the assertion fails
> and where you changed to code so that we don't call that code if we see
> that the assertion is going to fail. I'm good with that part of the fix
> and I didn't make that clear before. Sorry about that.
>
>
> For the else-branch where you added the call to the lock_in_top_frame():
>
> L1488: if (mark->has_bias_pattern()) {
> L1489: if (mark->biased_locker() == thread) {
> <snip>
> L1492: } else {
> L1493: #ifdef ASSERT
> L1494: lock_in_top_frame(obj, thread);
> L1495: #endif
> L1496: }
>
> You are still in the "has_bias_pattern" if-statement so we know the
> object is biased. You are in the else-branch of comparing the biased
> locker to the target thread so you know that the object is biased to
> some other thread.
>
> Okay so now I'm oriented in the new checking code. The target thread
> has a BasicLock on its stack that refers to the object, but the object
> is biased to another thread so the target thread is contending on the
> lock which means that the frame _must be_ the top frame because the
> target thread is blocked.
Yes! The VM thread is running the handshake on behalf of the target thread,
which means it's either native or blocked (or suspended).
>
> I (finally) see why the new check is okay. Maybe I'm the only
> reviewer that had trouble with the new check and, if so, I'm
> sorry about that.
No, this was/is very complicated to find and understand.
>
> Please consider the following:
>
> L1492: } else {
> // Biased to another thread means 'thread' is
> // contending for the lock in the top frame.
> DEBUG_ONLY(verify_BasicLock_in_top_frame(obj, thread);
> }
>
> Thumbs up! Your call on whether to make more changes to this fix.
Great thanks, fixed!
For the record here is the last part of the stacktrace for the JavaThread the VM
thread is execution the handshake on behalf of. (Not same crash, local reprod)
#0 0x00007f110137c72c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
(gdb) bt
#0 0x00007f110137c72c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x00007f11006d9e80 in os::PlatformMonitor::wait
(this=this at entry=0x7f10f801e530, millis=millis at entry=0) at
/home/rehn/source/jdk/vanilla-jdk/open/src/hotspot/os/posix/os_posix.hpp:279
#2 0x00007f110063a06c in Monitor::wait (this=0x7f10f801e520,
timeout=timeout at entry=0, as_suspend_equivalent=as_suspend_equivalent at entry=false)
at
/home/rehn/source/jdk/vanilla-jdk/open/src/hotspot/share/runtime/mutex.cpp:214
#3 0x00007f1100a92c9f in MonitorLocker::wait (timeout=0,
as_suspend_equivalent=false, this=0x7f10a2ef2bd0) at
/home/rehn/source/jdk/vanilla-jdk/open/src/hotspot/share/runtime/mutexLocker.hpp:261
#4 VMThread::execute (op=0x7f10a2ef2e40) at
/home/rehn/source/jdk/vanilla-jdk/open/src/hotspot/share/runtime/vmThread.cpp:755
#5 0x00007f10ff96f02d in BiasedLocking::revoke_and_rebias (obj=...,
attempt_rebias=attempt_rebias at entry=true,
__the_thread__=__the_thread__ at entry=0x7f10f84d3000)
at
/home/rehn/source/jdk/vanilla-jdk/open/src/hotspot/share/runtime/biasedLocking.cpp:745
#6 0x00007f11009376d3 in ObjectSynchronizer::fast_enter (obj=obj at entry=...,
lock=lock at entry=0x7f10a2ef35e8, attempt_rebias=attempt_rebias at entry=true,
__the_thread__=__the_thread__ at entry=0x7f10f84d3000)
at
/home/rehn/source/jdk/vanilla-jdk/open/src/hotspot/share/runtime/synchronizer.cpp:271
#7 0x00007f10ffac67ac in Runtime1::monitorenter (thread=0x7f10f84d3000,
obj=0x43492b1e0, lock=0x7f10a2ef35e8) at
/home/rehn/source/jdk/vanilla-jdk/open/src/hotspot/share/runtime/basicLock.hpp:67
#8 0x00007f10e0adacd4 in ?? ()
#9 0x0000000000000000 in ?? ()
Thanks Dan!
/Robbin
>
> Dan
>
>>
>> If this lock is biased
>>>> towards another
>>>> JavaThread we try to revoke that bias with a safepoint. In this case a deopt
>>>> handshake is already in queue. The handshake is thus executed before the revoke
>>>> safepoint.
>>>> The handshake goes over the monitors in compiled frames, find this lock and we
>>>> hit the assert. The assert make sure we actual can revoke the lock. A basic
>>>> lock
>>>> on stack should always, if biased, be biased to current thread, with the
>>>> exception:
>>>> We may have a stack lock biased against another thread until
>>>> ObjectSynchronizer::fast_enter returns.
>>>>
>>>> To handle this exception we can safely ignore biased lock towards other threads
>>>> in the deopt handshake. Since such locks will always be revoked before we
>>>> deopt/unpack stack.
>>>>
>>>> Code:
>>>> http://cr.openjdk.java.net/~rehn/8225351/v1/webrev/index.html
>>>
>>> First a little analysis:
>>>
>>> Our failing assert is here:
>>>
>>> src/hotspot/share/runtime/biasedLocking.cpp:
>>>
>>> BiasedLocking::Condition BiasedLocking::revoke_own_locks_in_handshake(Handle
>>> obj, TRAPS) {
>>>
>>> assert(mark->biased_locker() == THREAD &&
>>> prototype_header->bias_epoch() == mark->bias_epoch(), "Revoke
>>> failed, unhandled biased lock state");
>>>
>>> Since the function is called revoke_own_locks_in_handshake() it makes
>>> sense that we're checking that the locks are biased to THREAD (presumably
>>> the current thread since that's the TRAPS protocol). Just checking...
>>> Uhhhh... Looks like we can get to this code path from
>>> Deoptimization::deoptimize_frame_internal() when the current thread is
>>> the target thread *AND* when we are deoptimizing at a safepoint and the
>>> current thread is not the target thread. To my brain that makes use of
>>> the 'TRAPS' parameter "wrong", i.e., it should a "Thread* thread"
>>> parameter, but that's probably a carry over from older biased locking
>>> code.
>>>
>>> However, let's leave the topic of the "TRAPS" parameter alone for a
>>> separate investigation. It's entirely possible that my memory about the
>>> TRAPS protocol is rusty (or it has evolved since I last committed it to
>>> memory).
>>
>> Old code, changed to "JavaThread *thread".
>>
>>>
>>>
>>> src/hotspot/share/runtime/deoptimization.cpp
>>> old L1464: oop obj = (objects_to_revoke->at(i))();
>>> old L1465:
>>> BiasedLocking::revoke_own_locks_in_handshake(objects_to_revoke->at(i), thread);
>>>
>>> new L1486: Handle obj = objects_to_revoke->at(i);
>>> new L1487: markOop mark = obj->mark();
>>> new L1488: if (mark->has_bias_pattern()) {
>>> new L1489: if (mark->biased_locker() == thread) {
>>> new L1490: BiasedLocking::revoke_own_locks_in_handshake(obj, thread);
>>>
>>> So BiasedLocking::revoke_own_locks_in_handshake() has:
>>>
>>> > markOop mark = obj->mark();
>>> > if (!mark->has_bias_pattern()) {
>>> > return NOT_BIASED;
>>> > }
>>>
>>> which shows that the code previously recognized that it could be
>>> called with an 'obj' parameter that wasn't biased. Perhaps it
>>> would be better to change revoke_own_locks_in_handshake() to
>>> something like:
>>
>> We may not call mark->biased_locker() on a non-biased lock.
>> So this call is just to let us call biased_locker().
>> There is not a biased_locker_if_biased() method.
>>
>>>
>>> if (!mark->has_bias_pattern() || mark->biased_locker() != THREAD) {
>>>
>>> Of course, that means we would be returning "NOT_BIASED" for
>>> the case where the target object _is_ biased, but just not
>>> biased to our target thread.
>>
>> Yes, I did not like that. And since the method is called 'own_locks' it didn't
>> make sense to feed locks owned by other threads.
>>
>>>
>>> Another thought occurs to me: If this is happening during a handshake
>>> and not at a safepoint, then what prevents the object from passing
>>> new L1488 and L1489 in deoptimization.cpp, then we call
>>> revoke_own_locks_in_handshake() and pass "(!mark->has_bias_pattern())"
>>> only discover that the object has been rebiased to another thread at
>>> the assert? In other words, what happens if this revocation attempt
>>> loses a race with another revocation & rebias attempt?
>>
>> In this case current thread going to revoke that bias in the safepoint happening
>> after the handshake. Any other thread will need todo a safepoint to revoke it
>> also. Since we are in a handshake we know there is not going to be a safepoint
>> until we finishes the handshake.
>> It's not possible to rebias to another thread without a safepoint when the
>> lock is biased to another thread.
>>
>>>
>>> L1493: #ifdef ASSERT
>>> L1494: lock_in_top_frame(obj, thread);
>>> L1495: #endif
>>> I think this should be:
>>>
>>> DEBUG_ONLY(lock_in_top_frame(obj, thread);)
>>
>> Fixed.
>>
>>>
>>> L1456: void lock_in_top_frame(Handle ho, JavaThread* thread) {
>>> L1463: if (mon_info->eliminated()) {
>>> L1464: continue;
>>> L1465: }
>>> L1473: assert(false, "Lock not found in top frame");
>>> I'm having problems with this sanity check. That function expects
>>> to find a BasicLock (via a MonitorInfo) that refers to the
>>> object whose bias we are trying to revoke. That check is assuming
>>> that a biased lock must be in the thread's top frame and I don't
>>> think that is a valid assumption.
>>>
>>> revoke_using_handshake() calls get_monitors_from_stack() and that
>>> function collects objects that have monitors from all vframes into
>>> 'objects_to_revoke'.
>>>
>>> Maybe I'm missing something obvious here?
>>
>> I hope above explains more.
>>
>> Thanks, Robbin
>>
>>>
>>> Dan
>>>
>>>> Issue:
>>>> https://bugs.openjdk.java.net/browse/JDK-8225351
>>>>
>>>> Passes t1-7
>>>> The assert code tested with local code changes to HandshakeAlot handshake.
>>>> We then see this state where last lock can be biased towards another thread
>>>> and the thread is trying to execute revoke safepoint.
>>>>
>>>> Thanks, Robbin
>>>
More information about the hotspot-compiler-dev
mailing list