RFR: 8252871: fatal error: must own lock JvmtiThreadState_lock

Tue Sep 8 10:42:54 UTC 2020

Hi Robbin,

On 8/09/2020 5:39 pm, Robbin Ehn wrote:
> Hi David,
> 
>>> When these two methods (set_frame_pop/clear_frame_pop) are called in a handshake the requesting thread will have lock
>>> the JvmtiThreadState_lock. But the thread executing one of these in the handshake may not be the owner.
>>
>> Ouch! We continue to get bitten by the fact the executor of a handshake
>> operation is not known a-priori. :(
>>
> 
> We could add this. But I would prefer not doing it in this bug fix.

Okay we can put in the current proposed fix to get the test working 
again. Please file a new bug to fix the underlying locking problem.

>>> So we only check that JvmtiThreadState_lock is locked.
>>
>> That avoids the problem but it is not really a sufficient check - we
>> need to know that it is the handshaker thread that owns the lock and
>> that it acquired it purely for the purpose of this handshake operation!
>>
> 
> Right now we know that because I traced the code :)

I'm sure you get my point though. :)

> As I said I can make the requesting thread known to execution thread,
> but I would really prefer not doing that in this bug fix.
> 
>> More importantly this issue shows that the locking code is in fact
>> incorrect. We are on very dangerous ground if we are going to allow
>> locks to be "proxied" in this fashion! This is akin to reintroducing
>> "sneaky locks"!
> 
> A problem is that we use a safepoiting global lock to protect per thread resource.
> The way we get around this with safepoints is:
> "assert_locked_or_safepoint" (assert may be passed by a non-JavaThread or JavaThread in blocked/native without lock
> during a safepoint (safepoint + VM thread would fix that)) Which have this in it:
>    // see if invoker of VM operation owns it
>    VM_Operation* op = VMThread::vm_operation();
>    if (op != NULL && op->calling_thread() == lock->owner()) return;
> 
> So we already have proxied locks and in other cases just avoiding them (by just checking if at safepoint).

Okay ... we know that the VMThread and safepoint VMops are special, and 
always have been special, and people absolutely loathe the things we did 
in supporting that special-ness and we (well you, Erik, Patricio, Dan) 
have been working hard to get rid of a lot of that special code. So I 
can't accept an argument that it is okay to take VMThread/VMop special 
behaviour (e.g. lock proxying) and now spread it around to make other 
special cases.

> And as I said a few times now, I said I can make the requesting thread known to execution thread,
> but I would really prefer not doing that in this bug fix.

Understood.

I have a general concern with locking in relation to handshakes, that we 
cannot actually get rid of all the special handling that pertained to 
safepoints (safepoint-check-always/never, lock ranking) just because we 
now use handshakes. We have the same kinds of deadlock concerns if a 
handshake operation tries to take a lock and might block the thread. So 
same problem as safepoints and locks but no supporting code to try and 
help us with that.

So if we cannot simply grab the necessary locks as part of the handshake 
operation, then we need a way to ensure locking correctness prior to the 
op - and the simplest seems to be that the handshaker does the necessary 
locking and the handshake mechanism allows us to ensure the handshaker 
executes the operation not the target.

Thanks,
David
-----

>>
>> If a handshake operation can be executed by either the target thread or
>> the handshaker thread, and locks are required, then they will have to be
>> acquired in the context of the handshake operation. If we can't do that
>> because of potential deadlocks in the handshake mechanism then I feel
>> that handshakes have a serious flaw that needs to be rectified. Perhaps
>> we need a way to force the operation to be executed by the handshaker so
>> that they can setup the right execution environment prior to the handshake?
> 
> The problem is that it's a safepointing lock.
> Only executing the handshake by requester is an interesting idea, I'll investigate that.
> 
>>
>>> When verifying the callers to these methods I notice "clear_to_frame_pop" was unused, so instead of fixing it I remove
>>> it.
>>
>> There is already a bug for that:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8252816
>>
>> so if this proceeds you should add that issue to the PR.
> 
> Ok, I'll do that.
> 
> Thanks for having a look, let me know howto proceed.
> 
>>
>> Thanks,
>> David
>> -----
> 
> -------------
> 
> PR: https://git.openjdk.java.net/jdk/pull/60
>