RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()

Tue Apr 3 13:52:50 UTC 2018

On 3/04/2018 11:33 PM, Daniel D. Daugherty wrote:
> On 4/3/18 9:31 AM, David Holmes wrote:
>> On 3/04/2018 10:40 PM, Daniel D. Daugherty wrote:
>>> On 4/2/18 5:49 PM, David Holmes wrote:
>>>> On 3/04/2018 7:08 AM, Daniel D. Daugherty wrote:
>>>>> Hi David!
>>>>>
>>>>> Thanks for the quick review.
>>>>>
>>>>>
>>>>> On 4/2/18 4:34 PM, David Holmes wrote:
>>>>>> Hi Dan,
>>>>>>
>>>>>> Only query, that I overlooked before is:
>>>>>>
>>>>>>  void TracingExport::set_sampler_thread(Thread * thread) {
>>>>>> -  _sampler_thread = thread;
>>>>>> +  OrderAccess::release_store_fence(&_sampler_thread, thread);
>>>>>>  }
>>>>>>
>>>>>> why a release_store_fence() rather than simple release_store()?
>>>>>
>>>>> I based that choice on this comment:
>>>>>
>>>>> src/hotspot/share/runtime/orderAccess.hpp:
>>>>>
>>>>> // Conventional usage is to issue a load_acquire for ordered 
>>>>> loads.  Use
>>>>> // release_store for ordered stores when you care only that prior 
>>>>> stores
>>>>> // are visible before the release_store, but don't care exactly 
>>>>> when the
>>>>> // store associated with the release_store becomes visible. Use
>>>>> // release_store_fence to update values like the thread state, 
>>>>> where we
>>>>> // don't want the current thread to continue until all our prior 
>>>>> memory
>>>>> // accesses (including the new thread state) are visible to other 
>>>>> threads.
>>>>> // This is equivalent to the volatile semantics of the Java Memory 
>>>>> Model.
>>>>>
>>>>> I see the _sampler_thread field as similar to the thread state.
>>>>>
>>>>> Since the setter (a generic tracing sampler thread), is going to 
>>>>> either
>>>>> publish a non-NULL value when the sampler thread starts or is going to
>>>>> publish a NULL value when the sampler thread is about to be deleted, I
>>>>> figured it was a good idea for the setter to not proceed from 
>>>>> either of
>>>>> the set points until the value was published and observable by the 
>>>>> threads
>>>>> that might be doing a get.
>>>>>
>>>>> When publishing a non-NULL value:
>>>>>    - I don't want the setter to assume that a ThreadsList is safe
>>>>>      and start using it while a different thread processing the
>>>>>      to-be-deleted list thinks that ThreadsList is freeable.
>>>>>
>>>>> When publishing a NULL value:
>>>>>    - I don't want the setter thread to delete itself while another
>>>>>      thread thinks that the generic tracing sampler thread is still
>>>>>      available and query-able.
>>>>>
>>>>> Please let me know if I'm being overly cautious.
>>>>
>>>> Very hard to tell. But what you describe seems specific to the 
>>>> semantics of ThreadSMR not to a general setting/getting a "sampler 
>>>> thread" reference, or even the general semantics of threads_do().
>>>>
>>>> The safety of publishing a NULL is also highly dependent on the 
>>>> overall way in which the sampler thread interacts with ThreadSMR - 
>>>> something which is completely invisible in this webrev as we don't 
>>>> see that thread.
>>>
>>> Right, we're having that discussion over on the closed thread... :-)
>>> I'm wishing that "JEP 328: Flight Recorder" was done... :-)
>>
>> Ye me too :)
>>
>>>
>>>> I'm wondering if the acquire/release-fence semantics should be 
>>>> visible in the get/set method names for clarity , as we do elsewhere?
>>>
>>> What I have right now is working and I haven't seen any test failures
>>> since I added the OrderAccess calls. I'd like to go with that and take
>>> up any further tweaks with Markus and Erik G. as part of their work...
>>
>> Renaming the methods wouldn't affect your testing, but if we're going 
>> to continue tweaking in this area then it can wait.
> 
> Well, at this point, I'm going to try out Erik's grab the Threads_lock
> idea so since I'm changing code...
> 
> What names would you like? 

release_set_sampler_thread_fence(Thread* t)
get_sampler_thread_acquire()

would match existing conventions (though use of get_ is itself not 
normal convention).

> Also, do you want me to switch to release_store() for this next round of testing?

I can't answer that without detailed understanding of how it will be used.

David

> 
> Dan
> 
> 
>>
>> Thanks,
>> David
>>
>>> Dan
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 3/04/2018 1:11 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> This fix has been revised due to additional testing and due to
>>>>>>> feedback from David H.
>>>>>>>
>>>>>>> Here's the incremental webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.inc/ 
>>>>>>>
>>>>>>>
>>>>>>> And here's the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.full/ 
>>>>>>>
>>>>>>>
>>>>>>> My usual Thread-SMR stress testing on my Solaris-X64 is still 
>>>>>>> running;
>>>>>>> so far there has been only one unrelated intermittent test 
>>>>>>> failure. The
>>>>>>> Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run is still in
>>>>>>> process...
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>> On 3/28/18 9:52 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>>>>
>>>>>>>> JDK-8200374 Add 
>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify 
>>>>>>>> threads_do()
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>>>>
>>>>>>>> Here's the webrev URL:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> This is Erik O's improvement on the assertion added by the 
>>>>>>>> following
>>>>>>>> bug fix (with some minor tweaking done by me):
>>>>>>>>
>>>>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>>>>
>>>>>>>> Summary of the changes:
>>>>>>>>
>>>>>>>> - Replace the assertion that I added in JDK-8199813 with a closure
>>>>>>>>   based function that verifies the threads_do() contract depended
>>>>>>>>   on by Thread-SMR.
>>>>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify
>>>>>>>>   that the calling thread's hazard pointer is scanned by 
>>>>>>>> threads_do().
>>>>>>>>   The new function is called from both 
>>>>>>>> ThreadsSMRSupport::acquire_*()
>>>>>>>>   functions.
>>>>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>>>>   Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>>>>   can also be called by other threads_do() functions. Yes, the
>>>>>>>>   Threads::threads_do() contract is still to scan every thread in
>>>>>>>>   the system.
>>>>>>>> - Add hooks for a "tracing sampler thread" to be optionally scanned
>>>>>>>>   by Threads::non_java_threads_do().
>>>>>>>>
>>>>>>>> This fix has gone thru a couple of Mach5 builds-tier1, 
>>>>>>>> jdk-tier[1-3],
>>>>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour 
>>>>>>>> Thread-SMR
>>>>>>>> stress testing run on my Solaris-X64 server.
>>>>>>>>
>>>>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>>>>> JavaThread::verify_not_published() (which started this round of
>>>>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>