RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()

Daniel D. Daugherty daniel.daugherty at oracle.com
Sun Apr 1 01:40:58 UTC 2018


On 3/31/18 8:50 PM, David Holmes wrote:
> Hi Dan,
>
> On 1/04/2018 10:42 AM, Daniel D. Daugherty wrote:
>> Hi David!
>>
>> Happy Easter!
>
> Thanks - and to you!
>
>>
>> On 3/31/18 7:40 PM, David Holmes wrote:
>>> On 31/03/2018 10:51 PM, Daniel D. Daugherty wrote:
>>>> Hi David,
>>>>
>>>> Thanks for the review!
>>>>
>>>> On 3/31/18 3:08 AM, David Holmes wrote:
>>>>> Hi Dan,
>>>>>
>>>>> This all seems fine to me.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>> Just to be nitpicky the set/get_sampler_thread methods could be 
>>>>> declared inline in the .hpp file. ;-)
>>>>
>>>> Then I would have to put them in an inline.hpp file right? :-)
>>>
>>> That was a joke right?
>>
>> Not really... sort of... it's all so confusing... :-)
>> See below...
>>
>>
>>>
>>>> In any case, I tweaked those functions yesterday to fix an 
>>>> intermittent
>>>> failure in a JFR test:
>>>>
>>>> $ hg diff -w src/hotspot/share/trace/tracingExport.cpp 
>>>> src/hotspot/share/trace/tracingExport.hpp
>>>> diff -r df946ef9863e src/hotspot/share/trace/tracingExport.cpp
>>>> --- a/src/hotspot/share/trace/tracingExport.cpp    Wed Mar 28 
>>>> 20:16:43 2018 -0400
>>>> +++ b/src/hotspot/share/trace/tracingExport.cpp    Sat Mar 31 
>>>> 08:48:01 2018 -0400
>>>> @@ -23,14 +23,16 @@
>>>>    */
>>>>
>>>>   #include "precompiled.hpp"
>>>> +#include "runtime/orderAccess.inline.hpp"
>>>>   #include "trace/tracingExport.hpp"
>>>>
>>>> -Thread * TracingExport::_sampler_thread = (Thread *)NULL;
>>>> +// Allow lock-free access to _sampler_thread.
>>>> +Thread* volatile TracingExport::_sampler_thread = (Thread*)NULL;
>>>
>>> Nit: you should never need to cast NULL when assigning a pointer 
>>> variable.
>>
>> I thought I saw something go by that said some compiler was
>> complaining about NULL assignments... maybe that was something
>> like this:
>>
>>    char junk_byte = NULL;
>>
>> and the the compiler wanted:
>>
>>    char junk_byte = '\0';
>>
>> In any case, I'll make sure I fix those before I send out the next 
>> webrev...
>>
>>>
>>>>   Thread * TracingExport::get_sampler_thread() {
>>>> -  return _sampler_thread;
>>>> +  return (Thread*)OrderAccess::load_acquire(&_sampler_thread);
>>>>   }
>>>>
>>>>   void TracingExport::set_sampler_thread(Thread * thread) {
>>>> -  _sampler_thread = thread;
>>>> +  OrderAccess::release_store_fence(&_sampler_thread, thread);
>>>>   }
>>>
>>> Okay _now_ they would have to into a .inline.hpp file :)
>>
>> So what's the rule again? Right now they're in a .cpp and they
>> call inline functions from OrderAccess so we're good. If I make
>> them inline, then I have to move them out of the .cpp file and
>> into an .inline.hpp... but only because I'm including a .inline.hpp
>> file myself...
>>
>> I thought we were trying to move inline functions out of .hpp files
>> and into .inline.hpp files... So if I made my new functions inline,
>> then I would have to move them even before I switched them to call
>> OrderAccess...
>>
>> I guess I could still be confused... :-)
>
> AFAIK only non-trivial inline definitions should be moved into a 
> .inline.hpp file, and "non-trivial" includes the case where you use 
> something which is itself in a .inline.hpp file. So:
>
> Foo* getFoo() { return _foo; }
>
> can go into a regular .hpp as an inline function definition in the 
> class. Whereas:
>
> Foo* getFoo() { return OrderAccess::loadAcquire(&_foo); }
>
> either needs to be out-of-line in the .cpp file, or else declared 
> inline in a .inline.hpp file.
>
> So if these use OrderAccess then I'm fine with them remaining in the 
> .cpp file.

Thanks!

Dan

>
> Cheers,
> David
>
>>
>>>
>>>> diff -r df946ef9863e src/hotspot/share/trace/tracingExport.hpp
>>>> --- a/src/hotspot/share/trace/tracingExport.hpp    Wed Mar 28 
>>>> 20:16:43 2018 -0400
>>>> +++ b/src/hotspot/share/trace/tracingExport.hpp    Sat Mar 31 
>>>> 08:48:01 2018 -0400
>>>> @@ -33,7 +33,7 @@
>>>>     static void set_sampler_thread(Thread * thread);
>>>>
>>>>    private:
>>>> -  static Thread * _sampler_thread;
>>>> +  static Thread* volatile _sampler_thread;
>>>>   };
>>>>
>>>>   #endif // SHARE_VM_TRACE_TRACINGEXPORT_HPP
>>>>
>>>> I will be sending out an updated webrev on Monday with all these 
>>>> tweaks
>>>> after running through Therad-SMR stress on my Solaris-X64 server...
>>>
>>> Ok.
>>
>> So far no issues other a known intermittent... I'm hoping for no more 
>> JFR
>> test failures... :-)
>>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>>
>>>> Dan
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 30/03/2018 2:26 AM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> During my testing where I restored the bug in 
>>>>>> JavaThread::verify_not_published()
>>>>>> to verify that the new function works, I made a few minor 
>>>>>> improvements:
>>>>>>
>>>>>> $ hg diff
>>>>>> diff -r df946ef9863e src/hotspot/share/runtime/thread.cpp
>>>>>> --- a/src/hotspot/share/runtime/thread.cpp    Wed Mar 28 20:16:43 
>>>>>> 2018 -0400
>>>>>> +++ b/src/hotspot/share/runtime/thread.cpp    Thu Mar 29 12:18:46 
>>>>>> 2018 -0400
>>>>>> @@ -3438,7 +3438,9 @@
>>>>>>     // Someday we could have a table or list of all non-JavaThreads.
>>>>>>     // For now, just manually iterate through them.
>>>>>>     tc->do_thread(VMThread::vm_thread());
>>>>>> -  Universe::heap()->gc_threads_do(tc);
>>>>>> +  if (Universe::heap() != NULL) {
>>>>>> +    Universe::heap()->gc_threads_do(tc);
>>>>>> +  }
>>>>>>     WatcherThread *wt = WatcherThread::watcher_thread();
>>>>>>     // Strictly speaking, the following NULL check isn't 
>>>>>> sufficient to make sure
>>>>>>     // the data for WatcherThread is still valid upon being 
>>>>>> examined. However,
>>>>>> diff -r df946ef9863e src/hotspot/share/runtime/threadSMR.cpp
>>>>>> --- a/src/hotspot/share/runtime/threadSMR.cpp    Wed Mar 28 
>>>>>> 20:16:43 2018 -0400
>>>>>> +++ b/src/hotspot/share/runtime/threadSMR.cpp    Thu Mar 29 
>>>>>> 12:18:46 2018 -0400
>>>>>> @@ -580,6 +580,8 @@
>>>>>>   // the Thread-SMR protocol.
>>>>>>   void ThreadsSMRSupport::verify_hazard_pointer_scanned(Thread 
>>>>>> *self, ThreadsList *threads) {
>>>>>>   #ifdef ASSERT
>>>>>> +  assert(threads != NULL, "threads must not be NULL");
>>>>>> +
>>>>>>     // The closure will attempt to verify that the calling thread 
>>>>>> can
>>>>>>     // be found by threads_do() on the specified ThreadsList. If it
>>>>>>     // is successful, then the specified ThreadsList was acquired as
>>>>>>
>>>>>>
>>>>>> The Universe::heap() NULL check in Threads::non_java_threads_do() is
>>>>>> needed because the particular code path to verify_not_published() is
>>>>>> very early in VM bringup... without it we always fail on a NULL ptr
>>>>>> deref (which isn't interesting)...
>>>>>>
>>>>>> The new assert() in verify_hazard_pointer_scanned() is just the 
>>>>>> usual
>>>>>> verify your input parameters...
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> On 3/29/18 9:09 AM, Daniel D. Daugherty wrote:
>>>>>>> Erik,
>>>>>>>
>>>>>>> Thanks for the quick review!
>>>>>>>
>>>>>>>
>>>>>>> On 3/29/18 3:00 AM, Erik Österlund wrote:
>>>>>>>> Hi Dan,
>>>>>>>>
>>>>>>>> This looks great.
>>>>>>>
>>>>>>> Good to hear especially since you wrote most of it... :-)
>>>>>>>
>>>>>>>
>>>>>>>> I will sleep much better at night with this 1:1 verification. :)
>>>>>>>
>>>>>>> It does tighten things up nicely!
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> /Erik
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2018-03-29 03:52, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>>>>>
>>>>>>>>> JDK-8200374 Add 
>>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify 
>>>>>>>>> threads_do()
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>>>>>
>>>>>>>>> Here's the webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is Erik O's improvement on the assertion added by the 
>>>>>>>>> following
>>>>>>>>> bug fix (with some minor tweaking done by me):
>>>>>>>>>
>>>>>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>>>>>
>>>>>>>>> Summary of the changes:
>>>>>>>>>
>>>>>>>>> - Replace the assertion that I added in JDK-8199813 with a 
>>>>>>>>> closure
>>>>>>>>>   based function that verifies the threads_do() contract depended
>>>>>>>>>   on by Thread-SMR.
>>>>>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to 
>>>>>>>>> verify
>>>>>>>>>   that the calling thread's hazard pointer is scanned by 
>>>>>>>>> threads_do().
>>>>>>>>>   The new function is called from both 
>>>>>>>>> ThreadsSMRSupport::acquire_*()
>>>>>>>>>   functions.
>>>>>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>>>>>   Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>>>>>   can also be called by other threads_do() functions. Yes, the
>>>>>>>>>   Threads::threads_do() contract is still to scan every thread in
>>>>>>>>>   the system.
>>>>>>>>> - Add hooks for a "tracing sampler thread" to be optionally 
>>>>>>>>> scanned
>>>>>>>>>   by Threads::non_java_threads_do().
>>>>>>>>>
>>>>>>>>> This fix has gone thru a couple of Mach5 builds-tier1, 
>>>>>>>>> jdk-tier[1-3],
>>>>>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour 
>>>>>>>>> Thread-SMR
>>>>>>>>> stress testing run on my Solaris-X64 server.
>>>>>>>>>
>>>>>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>>>>>> JavaThread::verify_not_published() (which started this round of
>>>>>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>



More information about the hotspot-runtime-dev mailing list