RFR: 8210832: Remove sneaky locking in class Monitor
David Holmes
david.holmes at oracle.com
Tue Jan 29 21:35:36 UTC 2019
Hi Erik,
On 29/01/2019 11:18 pm, Erik Österlund wrote:
> Hi David,
>
> I don't think we can get away with not deleting monitors. Because we do
> that all over the place.
Right. I had a mental model that placed the new PlatformMonitor in the
same type-stable-memory basket as PlatformEvent and PlatformParker. That
was an oversight on my part.
> Another example of deleted monitor is the SR_lock. We have one per
> thread, and delete them when the thread is deleted. SMR makes sure it is
> deleted when nobody is using it any longer. All sane.
Great example as it highlights my point. Deleting the _SR_lock was buggy
until you came up with SMR. It's very easy to get things wrong here, and
each usecase will have different means by which they need to ensure the
Mutex/Monitor is actually safe to delete.
My concern is that we may have existing races (as we did with _SR_lock)
but that deleting the current Monitor/mutex instances is quite benign
and may allow use-after-delete to not crash. The new code is not so
benign and may not be so forgiving if there are existing bugs in this area.
> The extra_data_lock() of MDOs belongs to MethodData objects, and is
> deleted when the MethodData is deleted (at which point nobody can be
> using it any longer). All sane.
Forgive my skepticism but having been reading recent bug reports about
nmethod cleanup and related data structures, it is very easy to have a
scheme that appears "All sane" but is in fact subtly broken.
Any way bottom line is yes of course we have to continue to allow
Mutex/monitor destruction. But I am concerned we may now trip over
existing bugs with such destruction.
Cheers,
David
-----
>
> There is plenty more.
>
> I think it's up to the user of the API to make sure locks are deleted
> only when they are not being used concurrently. We have plenty of tools
> to ensure this today (GlobalCounter, thread-local handshakes,
> safepoints, hazard pointers, outer locks, etc) Perhaps a suitable assert
> could catch errors where locking races with deletion, so the user can
> choose what tool to use to ensure its safety.
>
> Thanks,
> /Erik
>
> On 2019-01-29 13:13, David Holmes wrote:
>> On 29/01/2019 8:26 pm, Per Liden wrote:
>>> On 01/29/2019 10:53 AM, David Holmes wrote:
>>>> Hi Per,
>>>>
>>>> If I may jump in on one thing you suggest ... destructors. Do we
>>>> ever actually destroy Mutex or Monitor instances? There are inherent
>>>> races that can make it very dangerous to try and actually delete the
>>>> low-level PlatformMonitor and destroy the pthread_mutex or
>>>> pthread_cond, and even release the memory used. The related
>>>> PlatformEvent and PlatformParker are expected to be immortal and I
>>>> think that is the same for PlatformMonitor.
>>>
>>> There are examples of where we do destroy Mutex instances today (like
>>> JVM_RawMonitorDestroy), and I don't think that's something the API
>>> should forbid.
>>
>> RawMonitors are used very rarely and I would not have much confidence
>> that their destruction is actually done in a safe manner.
>>
>>>
>>> As Robbin hinted, I'm hoping ZLock (which is just a plain
>>> pthread_mutex) in ZGC can be converted to be a plain mutex using
>>> PlatforMutex in the future. ZLocks require dynamic
>>> creation/destruction to be supported as they are e.g. attached to
>>> nmethods which can come and go.
>>
>> I think this is something that requires some detailed consideration.
>> You need to have very robust lifecycle management to be able to delete
>> the low-level implementation objects safely. (It's not sufficient for
>> example to test the ability to destroy a pthread_mutex by checking if
>> you can first acquire it (and release it again) then destroy it and
>> deallocate it, as the previous owner could still be executing inside
>> pthread_mutex_unlock!).
>>
>> In the context of the current changes though we do have an
>> inconsistency between the high-level Mutex/Monitor and the
>> PlatformMonitor. This was an oversight on my part when doing some
>> earlier work on this.
>>
>> David
>> -----
>>
>>>>
>>>> Aside: I don't think distinct PlatformMutex and PlatforMonitor is
>>>> worth the effort unless we also rework the Mutex/Monitor
>>>> relationship as well.
>>>
>>> I completely agree, Mutex/Monitor would also need to be reworked a bit.
>>>
>>> cheers,
>>> Per
>>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> On 29/01/2019 7:22 pm, Per Liden wrote:
>>>>> Hi Patricio,
>>>>>
>>>>> On 01/28/2019 08:18 PM, Patricio Chilano wrote:
>>>>>> Hi Robbin,
>>>>>>
>>>>>> Thanks for reviewing this! Removing the block_in_safepoint_check
>>>>>> thread local attribute is a great idea, here is v02:
>>>>>>
>>>>>> Full: http://cr.openjdk.java.net/~pchilanomate/8210832/v02/webrev
>>>>>
>>>>> I really like that we're ditching our old locking code in favor of
>>>>> using pthread_mutex, et al. Nice work!
>>>>>
>>>>>
>>>>> General comment
>>>>> ----------------
>>>>> I think Mutex to be a plain mutex and not come with the baggage of
>>>>> having a conditional variable. With this new code, it seems we're
>>>>> in a really good position to make that happen. I.e. something like
>>>>> this:
>>>>>
>>>>> class PlatformMutex {
>>>>> protected:
>>>>> pthread_mutex_t _mutex;
>>>>>
>>>>> public:
>>>>> PlatformMutex();
>>>>> ~PlatformMutex();
>>>>>
>>>>> void lock();
>>>>> void unlock();
>>>>> bool try_lock();
>>>>> };
>>>>>
>>>>> class PlatformMonitor : public PlatformMutex {
>>>>> private:
>>>>> pthread_cond_t _cond;
>>>>>
>>>>> public:
>>>>> PlatformMonitor();
>>>>> ~PlatformMonitor();
>>>>>
>>>>> int wait(jlong millis);
>>>>> void notify();
>>>>> void notify_all();
>>>>> };
>>>>>
>>>>> It might be that we want to do that as a separate step later
>>>>> instead of including it in this patch. But I think we should try to
>>>>> get there.
>>>>>
>>>>>
>>>>> src/hotspot/os/posix/os_*.[ch]pp
>>>>> ---------------------------------
>>>>> * I'd suggest that we place the PlatformMonitor class in a separate
>>>>> file (like src/hotspot/os/posix/monitor_posix.cpp), just like we
>>>>> have done with Semaphore (in
>>>>> src/hotspot/os/posix/semaphore_posix.cpp).
>>>>>
>>>>>
>>>>> src/hotspot/os/posix/os_posix.hpp
>>>>> src/hotspot/os/solaris/os_solaris.hpp
>>>>> src/hotspot/os/windows/os_windows.hpp
>>>>> -------------------------------------
>>>>> * Please make _mutex/_cond plain variables, instead of arrays of 1.
>>>>> That's just ugly ;)
>>>>>
>>>>>
>>>>> src/hotspot/os/posix/os_posix.cpp
>>>>> ---------------------------------
>>>>> * Destructor missing, to call pthread_(mutex|cond)_destroy().
>>>>>
>>>>>
>>>>> src/hotspot/os/solaris/os_solaris.hpp
>>>>> -------------------------------------
>>>>> * Not sure if there's a good reason to have the constructor be
>>>>> inlined here. I'd suggest moving it to the cpp file.
>>>>>
>>>>> * Destructor missing.
>>>>>
>>>>>
>>>>> src/hotspot/os/windows/os_windows.cpp
>>>>> -------------------------------------
>>>>> * Destructor missing (I'm not too familiar with the windows API but
>>>>> I assume there's a destroy function we should call here).
>>>>>
>>>>>
>>>>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>>>> -----------------------------------------------------
>>>>> * Move "private:" above monitor_adr;
>>>>>
>>>>> 289 class ThreadLockBlockInVM : public ThreadStateTransition {
>>>>> 290 Monitor** monitor_adr;
>>>>> 291 private:
>>>>> 292 void do_preempted(Monitor** in_flight_monitor_adr) {
>>>>>
>>>>> * monitor_adr should be _monitor_adr, or maybe even
>>>>> _in_flight_monitor_adr to better match the name of the argument.
>>>>>
>>>>>
>>>>> cheers,
>>>>> Per
>>>>>
>>>>>> Inc: http://cr.openjdk.java.net/~pchilanomate/8210832/v02/inc/webrev/
>>>>>>
>>>>>> Running mach5 again.
>>>>>>
>>>>>> Thanks,
>>>>>> Patricio
>>>>>>
>>>>>> On 1/28/19 8:31 AM, Robbin Ehn wrote:
>>>>>>> Hi Patricio,
>>>>>>>
>>>>>>> Mostly looks good!
>>>>>>>
>>>>>>> block_at_safepoint is always called with block_in_safepoint_check
>>>>>>> = true. (correct?)
>>>>>>> Changing that to a local state instead of global simplifies the
>>>>>>> code.
>>>>>>>
>>>>>>> So I'm suggesting something like below.
>>>>>>>
>>>>>>> Thanks, Robbin
>>>>>>>
>>>>>>> diff -r e65cc445234c
>>>>>>> src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>>>>>> --- a/src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>>>>>> Mon Jan 28 13:10:15 2019 +0100
>>>>>>> +++ b/src/hotspot/share/runtime/interfaceSupport.inline.hpp
>>>>>>> Mon Jan 28 14:10:59 2019 +0100
>>>>>>> @@ -308,2 +308,1 @@
>>>>>>> - thread->block_in_safepoint_check = false;
>>>>>>> - SafepointMechanism::block_at_safepoint(thread);
>>>>>>> + SafepointMechanism::callback_if_safepoint(thread);
>>>>>>> @@ -323,2 +322,1 @@
>>>>>>> - SafepointMechanism::block_at_safepoint(_thread);
>>>>>>> - _thread->block_in_safepoint_check = true;
>>>>>>> + SafepointMechanism::callback_if_safepoint(_thread);
>>>>>>> @@ -335,2 +332,0 @@
>>>>>>> - } else {
>>>>>>> - _thread->block_in_safepoint_check = true;
>>>>>>> @@ -337,0 +334,1 @@
>>>>>>> +
>>>>>>> diff -r e65cc445234c src/hotspot/share/runtime/safepoint.cpp
>>>>>>> --- a/src/hotspot/share/runtime/safepoint.cpp Mon Jan 28
>>>>>>> 13:10:15 2019 +0100
>>>>>>> +++ b/src/hotspot/share/runtime/safepoint.cpp Mon Jan 28
>>>>>>> 14:10:59 2019 +0100
>>>>>>> @@ -795,1 +795,1 @@
>>>>>>> -void SafepointSynchronize::block(JavaThread *thread) {
>>>>>>> +void SafepointSynchronize::block(JavaThread *thread, bool
>>>>>>> block_in_safepoint_check) {
>>>>>>> @@ -850,1 +850,1 @@
>>>>>>> - if (thread->block_in_safepoint_check) {
>>>>>>> + if (block_in_safepoint_check) {
>>>>>>> @@ -880,1 +880,1 @@
>>>>>>> - thread->block_in_safepoint_check) {
>>>>>>> + block_in_safepoint_check) {
>>>>>>> diff -r e65cc445234c src/hotspot/share/runtime/safepoint.hpp
>>>>>>> --- a/src/hotspot/share/runtime/safepoint.hpp Mon Jan 28
>>>>>>> 13:10:15 2019 +0100
>>>>>>> +++ b/src/hotspot/share/runtime/safepoint.hpp Mon Jan 28
>>>>>>> 14:10:59 2019 +0100
>>>>>>> @@ -146,1 +146,1 @@
>>>>>>> - static void block(JavaThread *thread);
>>>>>>> + static void block(JavaThread *thread, bool
>>>>>>> block_in_safepoint_check = true);
>>>>>>> diff -r e65cc445234c
>>>>>>> src/hotspot/share/runtime/safepointMechanism.hpp
>>>>>>> --- a/src/hotspot/share/runtime/safepointMechanism.hpp Mon Jan
>>>>>>> 28 13:10:15 2019 +0100
>>>>>>> +++ b/src/hotspot/share/runtime/safepointMechanism.hpp Mon Jan
>>>>>>> 28 14:10:59 2019 +0100
>>>>>>> @@ -82,1 +82,1 @@
>>>>>>> - static inline void block_at_safepoint(JavaThread* thread);
>>>>>>> + static inline void callback_if_safepoint(JavaThread* thread);
>>>>>>> diff -r e65cc445234c
>>>>>>> src/hotspot/share/runtime/safepointMechanism.inline.hpp
>>>>>>> --- a/src/hotspot/share/runtime/safepointMechanism.inline.hpp Mon
>>>>>>> Jan 28 13:10:15 2019 +0100
>>>>>>> +++ b/src/hotspot/share/runtime/safepointMechanism.inline.hpp Mon
>>>>>>> Jan 28 14:10:59 2019 +0100
>>>>>>> @@ -82,1 +82,1 @@
>>>>>>> -void SafepointMechanism::block_at_safepoint(JavaThread* thread) {
>>>>>>> +void SafepointMechanism::callback_if_safepoint(JavaThread*
>>>>>>> thread) {
>>>>>>> @@ -84,1 +84,1 @@
>>>>>>> - SafepointSynchronize::block(thread);
>>>>>>> + SafepointSynchronize::block(thread, false);
>>>>>>> diff -r e65cc445234c src/hotspot/share/runtime/thread.cpp
>>>>>>> --- a/src/hotspot/share/runtime/thread.cpp Mon Jan 28 13:10:15
>>>>>>> 2019 +0100
>>>>>>> +++ b/src/hotspot/share/runtime/thread.cpp Mon Jan 28 14:10:59
>>>>>>> 2019 +0100
>>>>>>> @@ -298,2 +297,0 @@
>>>>>>> - block_in_safepoint_check = true;
>>>>>>> -
>>>>>>> diff -r e65cc445234c src/hotspot/share/runtime/thread.hpp
>>>>>>> --- a/src/hotspot/share/runtime/thread.hpp Mon Jan 28 13:10:15
>>>>>>> 2019 +0100
>>>>>>> +++ b/src/hotspot/share/runtime/thread.hpp Mon Jan 28 14:10:59
>>>>>>> 2019 +0100
>>>>>>> @@ -788,2 +787,0 @@
>>>>>>> - bool block_in_safepoint_check; // to decide
>>>>>>> whether to block in SS::block or not
>>>>>>> -
>>>>>>>
>>>>>>>
>>>>>>> On 1/28/19 9:42 AM, Patricio Chilano wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Please review the following patch:
>>>>>>>>
>>>>>>>> Bug URL: https://bugs.openjdk.java.net/browse/JDK-8210832
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~pchilanomate/8210832/v01/webrev/
>>>>>>>>
>>>>>>>> The current implementation of native monitors uses a technique
>>>>>>>> that we name "sneaky locking" to prevent possible deadlocks of
>>>>>>>> the JVM during safepoints. The implementation of this technique
>>>>>>>> though introduces a race when a monitor is shared between the
>>>>>>>> VMThread and non-JavaThreads. This patch aims to solve that
>>>>>>>> problem and at the same time simplify the code.
>>>>>>>>
>>>>>>>> The proposal is based on the introduction of the new class
>>>>>>>> PlatformMonitor, which serves as a wrapper for the actual
>>>>>>>> synchronization primitives in each platform (mutexes and
>>>>>>>> condition variables). Most of the API calls can thus be
>>>>>>>> implemented as simple wrappers around PlatformMonitor, adding
>>>>>>>> more assertions and very little extra metadata.
>>>>>>>> To be able to remove the lock sneaking code and at the same time
>>>>>>>> avoid deadlocking scenarios, we combine two techniques:
>>>>>>>>
>>>>>>>> -When a JavaThread that has just acquired the lock, detects
>>>>>>>> there is a safepoint request in the ThreadLockBlockInVM
>>>>>>>> destructor, it releases the lock before blocking at the
>>>>>>>> safepoint. After resuming from it, the JavaThread will have to
>>>>>>>> acquire the lock again.
>>>>>>>>
>>>>>>>> - In the ThreadLockBlockInVM constructor for the Monitor::wait()
>>>>>>>> method, in order to avoid blocking we allow for a possible
>>>>>>>> safepoint request to make progress but without letting the
>>>>>>>> JavaThread block for it (since we would be stopped by the
>>>>>>>> destructor anyways). We also do that for the Monitor::lock()
>>>>>>>> case although no deadlock is being prevented there.
>>>>>>>>
>>>>>>>> The ThreadLockBlockInVM jacket is a new ThreadStateTransition
>>>>>>>> class used instead of the ThreadBlockInVM one. This allowed more
>>>>>>>> flexibility to handle the two techniques mentioned above. Also,
>>>>>>>> ThreadBlockInVM calls SafepointMechanism::block_if_requested()
>>>>>>>> which creates some problems when trying to allow safepoints to
>>>>>>>> continue without stopping, since that method not only checks for
>>>>>>>> safepoints but also processes handshakes.
>>>>>>>>
>>>>>>>> In terms of performance, benchmarks show very similar results to
>>>>>>>> what we have now.
>>>>>>>>
>>>>>>>> So far mach5 hs-tier1-6 on Linux, OS X, Windows and Solaris have
>>>>>>>> been tested.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Patricio
>>>>>>>>
>>>>>>
More information about the hotspot-dev
mailing list