RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()

David Holmes david.holmes at oracle.com
Sun Apr 1 00:50:13 UTC 2018


Hi Dan,

On 1/04/2018 10:42 AM, Daniel D. Daugherty wrote:
> Hi David!
> 
> Happy Easter!

Thanks - and to you!

> 
> On 3/31/18 7:40 PM, David Holmes wrote:
>> On 31/03/2018 10:51 PM, Daniel D. Daugherty wrote:
>>> Hi David,
>>>
>>> Thanks for the review!
>>>
>>> On 3/31/18 3:08 AM, David Holmes wrote:
>>>> Hi Dan,
>>>>
>>>> This all seems fine to me.
>>>
>>> Thanks.
>>>
>>>
>>>> Just to be nitpicky the set/get_sampler_thread methods could be 
>>>> declared inline in the .hpp file. ;-)
>>>
>>> Then I would have to put them in an inline.hpp file right? :-)
>>
>> That was a joke right?
> 
> Not really... sort of... it's all so confusing... :-)
> See below...
> 
> 
>>
>>> In any case, I tweaked those functions yesterday to fix an intermittent
>>> failure in a JFR test:
>>>
>>> $ hg diff -w src/hotspot/share/trace/tracingExport.cpp 
>>> src/hotspot/share/trace/tracingExport.hpp
>>> diff -r df946ef9863e src/hotspot/share/trace/tracingExport.cpp
>>> --- a/src/hotspot/share/trace/tracingExport.cpp    Wed Mar 28 
>>> 20:16:43 2018 -0400
>>> +++ b/src/hotspot/share/trace/tracingExport.cpp    Sat Mar 31 
>>> 08:48:01 2018 -0400
>>> @@ -23,14 +23,16 @@
>>>    */
>>>
>>>   #include "precompiled.hpp"
>>> +#include "runtime/orderAccess.inline.hpp"
>>>   #include "trace/tracingExport.hpp"
>>>
>>> -Thread * TracingExport::_sampler_thread = (Thread *)NULL;
>>> +// Allow lock-free access to _sampler_thread.
>>> +Thread* volatile TracingExport::_sampler_thread = (Thread*)NULL;
>>
>> Nit: you should never need to cast NULL when assigning a pointer 
>> variable.
> 
> I thought I saw something go by that said some compiler was
> complaining about NULL assignments... maybe that was something
> like this:
> 
>    char junk_byte = NULL;
> 
> and the the compiler wanted:
> 
>    char junk_byte = '\0';
> 
> In any case, I'll make sure I fix those before I send out the next 
> webrev...
> 
>>
>>>   Thread * TracingExport::get_sampler_thread() {
>>> -  return _sampler_thread;
>>> +  return (Thread*)OrderAccess::load_acquire(&_sampler_thread);
>>>   }
>>>
>>>   void TracingExport::set_sampler_thread(Thread * thread) {
>>> -  _sampler_thread = thread;
>>> +  OrderAccess::release_store_fence(&_sampler_thread, thread);
>>>   }
>>
>> Okay _now_ they would have to into a .inline.hpp file :)
> 
> So what's the rule again? Right now they're in a .cpp and they
> call inline functions from OrderAccess so we're good. If I make
> them inline, then I have to move them out of the .cpp file and
> into an .inline.hpp... but only because I'm including a .inline.hpp
> file myself...
> 
> I thought we were trying to move inline functions out of .hpp files
> and into .inline.hpp files... So if I made my new functions inline,
> then I would have to move them even before I switched them to call
> OrderAccess...
> 
> I guess I could still be confused... :-)

AFAIK only non-trivial inline definitions should be moved into a 
.inline.hpp file, and "non-trivial" includes the case where you use 
something which is itself in a .inline.hpp file. So:

Foo* getFoo() { return _foo; }

can go into a regular .hpp as an inline function definition in the 
class. Whereas:

Foo* getFoo() { return OrderAccess::loadAcquire(&_foo); }

either needs to be out-of-line in the .cpp file, or else declared inline 
in a .inline.hpp file.

So if these use OrderAccess then I'm fine with them remaining in the 
.cpp file.

Cheers,
David

> 
>>
>>> diff -r df946ef9863e src/hotspot/share/trace/tracingExport.hpp
>>> --- a/src/hotspot/share/trace/tracingExport.hpp    Wed Mar 28 
>>> 20:16:43 2018 -0400
>>> +++ b/src/hotspot/share/trace/tracingExport.hpp    Sat Mar 31 
>>> 08:48:01 2018 -0400
>>> @@ -33,7 +33,7 @@
>>>     static void set_sampler_thread(Thread * thread);
>>>
>>>    private:
>>> -  static Thread * _sampler_thread;
>>> +  static Thread* volatile _sampler_thread;
>>>   };
>>>
>>>   #endif // SHARE_VM_TRACE_TRACINGEXPORT_HPP
>>>
>>> I will be sending out an updated webrev on Monday with all these tweaks
>>> after running through Therad-SMR stress on my Solaris-X64 server...
>>
>> Ok.
> 
> So far no issues other a known intermittent... I'm hoping for no more JFR
> test failures... :-)
> 
> Dan
> 
> 
>>
>> Thanks,
>> David
>>
>>> Dan
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 30/03/2018 2:26 AM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> During my testing where I restored the bug in 
>>>>> JavaThread::verify_not_published()
>>>>> to verify that the new function works, I made a few minor 
>>>>> improvements:
>>>>>
>>>>> $ hg diff
>>>>> diff -r df946ef9863e src/hotspot/share/runtime/thread.cpp
>>>>> --- a/src/hotspot/share/runtime/thread.cpp    Wed Mar 28 20:16:43 
>>>>> 2018 -0400
>>>>> +++ b/src/hotspot/share/runtime/thread.cpp    Thu Mar 29 12:18:46 
>>>>> 2018 -0400
>>>>> @@ -3438,7 +3438,9 @@
>>>>>     // Someday we could have a table or list of all non-JavaThreads.
>>>>>     // For now, just manually iterate through them.
>>>>>     tc->do_thread(VMThread::vm_thread());
>>>>> -  Universe::heap()->gc_threads_do(tc);
>>>>> +  if (Universe::heap() != NULL) {
>>>>> +    Universe::heap()->gc_threads_do(tc);
>>>>> +  }
>>>>>     WatcherThread *wt = WatcherThread::watcher_thread();
>>>>>     // Strictly speaking, the following NULL check isn't sufficient 
>>>>> to make sure
>>>>>     // the data for WatcherThread is still valid upon being 
>>>>> examined. However,
>>>>> diff -r df946ef9863e src/hotspot/share/runtime/threadSMR.cpp
>>>>> --- a/src/hotspot/share/runtime/threadSMR.cpp    Wed Mar 28 
>>>>> 20:16:43 2018 -0400
>>>>> +++ b/src/hotspot/share/runtime/threadSMR.cpp    Thu Mar 29 
>>>>> 12:18:46 2018 -0400
>>>>> @@ -580,6 +580,8 @@
>>>>>   // the Thread-SMR protocol.
>>>>>   void ThreadsSMRSupport::verify_hazard_pointer_scanned(Thread 
>>>>> *self, ThreadsList *threads) {
>>>>>   #ifdef ASSERT
>>>>> +  assert(threads != NULL, "threads must not be NULL");
>>>>> +
>>>>>     // The closure will attempt to verify that the calling thread can
>>>>>     // be found by threads_do() on the specified ThreadsList. If it
>>>>>     // is successful, then the specified ThreadsList was acquired as
>>>>>
>>>>>
>>>>> The Universe::heap() NULL check in Threads::non_java_threads_do() is
>>>>> needed because the particular code path to verify_not_published() is
>>>>> very early in VM bringup... without it we always fail on a NULL ptr
>>>>> deref (which isn't interesting)...
>>>>>
>>>>> The new assert() in verify_hazard_pointer_scanned() is just the usual
>>>>> verify your input parameters...
>>>>>
>>>>> Dan
>>>>>
>>>>> On 3/29/18 9:09 AM, Daniel D. Daugherty wrote:
>>>>>> Erik,
>>>>>>
>>>>>> Thanks for the quick review!
>>>>>>
>>>>>>
>>>>>> On 3/29/18 3:00 AM, Erik Österlund wrote:
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> This looks great.
>>>>>>
>>>>>> Good to hear especially since you wrote most of it... :-)
>>>>>>
>>>>>>
>>>>>>> I will sleep much better at night with this 1:1 verification. :)
>>>>>>
>>>>>> It does tighten things up nicely!
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> /Erik
>>>>>>>
>>>>>>>
>>>>>>> On 2018-03-29 03:52, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>>>>
>>>>>>>> JDK-8200374 Add 
>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify 
>>>>>>>> threads_do()
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>>>>
>>>>>>>> Here's the webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> This is Erik O's improvement on the assertion added by the 
>>>>>>>> following
>>>>>>>> bug fix (with some minor tweaking done by me):
>>>>>>>>
>>>>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>>>>
>>>>>>>> Summary of the changes:
>>>>>>>>
>>>>>>>> - Replace the assertion that I added in JDK-8199813 with a closure
>>>>>>>>   based function that verifies the threads_do() contract depended
>>>>>>>>   on by Thread-SMR.
>>>>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify
>>>>>>>>   that the calling thread's hazard pointer is scanned by 
>>>>>>>> threads_do().
>>>>>>>>   The new function is called from both 
>>>>>>>> ThreadsSMRSupport::acquire_*()
>>>>>>>>   functions.
>>>>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>>>>   Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>>>>   can also be called by other threads_do() functions. Yes, the
>>>>>>>>   Threads::threads_do() contract is still to scan every thread in
>>>>>>>>   the system.
>>>>>>>> - Add hooks for a "tracing sampler thread" to be optionally scanned
>>>>>>>>   by Threads::non_java_threads_do().
>>>>>>>>
>>>>>>>> This fix has gone thru a couple of Mach5 builds-tier1, 
>>>>>>>> jdk-tier[1-3],
>>>>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour 
>>>>>>>> Thread-SMR
>>>>>>>> stress testing run on my Solaris-X64 server.
>>>>>>>>
>>>>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>>>>> JavaThread::verify_not_published() (which started this round of
>>>>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
> 


More information about the hotspot-runtime-dev mailing list