RFR(S): 8039458: Ensure consistency of Monitor/Mutex lock acquisitions in relation to safepoint protocol

Thu Apr 10 10:00:10 UTC 2014

Hi Markus,

Not sure about the need to set _acquired_with_no_safepoint_check = false 
in unlock, but otherwise this looks okay to me. You only need to check 
the most recently acquired lock, because when you acquired it you would 
have checked the previously acquired lock and so forth.

The double negation, !_acquired_with_no_safepoint_check, does make this 
a little hard to think about. :)

I think you may need some additional test coverage to see if this is 
going to expose existing potentially broken behaviour.

Thanks,
David

On 9/04/2014 10:04 PM, Markus Grönlund wrote:
> Greetings,
>
> Kindly asking for reviews for the following bug fix/enhancement
>
> Webrev: http://cr.openjdk.java.net/~mgronlun/8039458/webrev01/
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8039458
>
> Problem:
>
> VM code acquire Monitor/Mutex locks by respecting the correct lock
> order, which is enforced in debug builds (asserts). This is great.
>
> However, you could be taking locks in the correct order and everything
> might work just fine during development and testing (even product) but
> depending on runtime circumstances (load/concurrency), the way the
> lock(s) was acquired, i.e. the lock acquisition /mode/, can lead to very
> hard to debug problems (hangs/deadlocks). This is primarily related to
> the lock acquisition mode option “Mutex::_no_safepoint_check_flag” by
> which you inform your intent to respect or not-respect the safepoint
> protocol during lock acquisition/ownership.
>
>
> If a lock has already been acquired by a thread by passing the
> Mutex::_no_safepoint_check_flag to circumvent the safepoint protocol,
> the thread must *not* be allowed to attempt taking yet another lock on
> which it *might* block ( a lock which was not acquired by passing
> Mutex::_no_safepoint_check_flag in its acquisition operation), as such a
> lock would be participating in the safepoint protocol.
>
> Today, doing this mistake will work just fine _/if the lock is currently
> uncontended_ /which it normally is in development/testing situations*.*
> However, once the lock then becomes contended (highly volatile
> pressured/concurrent product deployment) you might get the worst kind of
> issues to debug (hangs/deadlocks).
>
> Naturally and as always, you have to very careful when you are taking
> locks using the Mutex::_no_safepoint_check_flag –however it can be very
> hard to determine what other code you can safely call once you are the
> owner of a Mutex::_no_safepoint_check_flag:ed lock  - and  today there
> is nothing but the developers attention to details that will find/help
> stay clear of this. I think we can do better here in enforcing some
> invariants in code for preempting potential deadlock prone situations.
>
> Description:
>
> Add debug assertions that enforce correct usages of the
> Mutex::_no_safepoint_check_flag when taking multiple locks.
>
>
> This suggestion leverages much of the existing code targeting
> enforcement of lock ordering- this is a simple additive change to also
> add invariant checking on "lock acquisition mode".
>
> Do note this primarily targets Java Threads running in the VM.
>
> Also included code comment:
>
> /*
>   * Ensure consistency of Monitor/Mutex lock acquisitions
>   * for Java Threads running inside the VM.
>   *
>   * If a thread has already acquired lock(s) using
>   * "Mutex::_no_safepoint_check_flag" (effectively going outside the
>   * safepoint protocol), the thread should be disallowed to acquire any
>   * additional lock which _is_ participating in the safepoint protocol.
>   *
>   * If a "safepoint protocol aware" lock is contended, and the thread
> entering
>   * is unable to "fast acquire" the lock using cas/try_spin,
>   * it will need to block/park. Blocking on a contended lock involves a
>   * state transition and a potential SafepointSynchronize::block() call.
>   * Transitioning to a blocked state still holding
> "Mutex::_no_safepoint_check_flag"
>   * acquired locks is allowed, but is *very* deadlock prone.
>   *
>   * The effect of allowing this state to happen without checking is subtle
>   * and hard to debug since a problem might only appear under heavy load
> and
>   * only in situations with increased concurrency levels (product builds).
> *
>   */
>
> Testing:
>
> Artificial code changes to MutexLocker and
> Mutex::_no_safepoint_check_flag on certain locks for detecting potential
> deadlock situations.
>
> Speccjbb2005
>
> Kitchensink
>
> Thank you
>
> Markus
>