RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()

David Holmes david.holmes at oracle.com
Tue Apr 3 13:31:14 UTC 2018


On 3/04/2018 10:40 PM, Daniel D. Daugherty wrote:
> On 4/2/18 5:49 PM, David Holmes wrote:
>> On 3/04/2018 7:08 AM, Daniel D. Daugherty wrote:
>>> Hi David!
>>>
>>> Thanks for the quick review.
>>>
>>>
>>> On 4/2/18 4:34 PM, David Holmes wrote:
>>>> Hi Dan,
>>>>
>>>> Only query, that I overlooked before is:
>>>>
>>>>  void TracingExport::set_sampler_thread(Thread * thread) {
>>>> -  _sampler_thread = thread;
>>>> +  OrderAccess::release_store_fence(&_sampler_thread, thread);
>>>>  }
>>>>
>>>> why a release_store_fence() rather than simple release_store()?
>>>
>>> I based that choice on this comment:
>>>
>>> src/hotspot/share/runtime/orderAccess.hpp:
>>>
>>> // Conventional usage is to issue a load_acquire for ordered loads.  Use
>>> // release_store for ordered stores when you care only that prior stores
>>> // are visible before the release_store, but don't care exactly when the
>>> // store associated with the release_store becomes visible.  Use
>>> // release_store_fence to update values like the thread state, where we
>>> // don't want the current thread to continue until all our prior memory
>>> // accesses (including the new thread state) are visible to other 
>>> threads.
>>> // This is equivalent to the volatile semantics of the Java Memory 
>>> Model.
>>>
>>> I see the _sampler_thread field as similar to the thread state.
>>>
>>> Since the setter (a generic tracing sampler thread), is going to either
>>> publish a non-NULL value when the sampler thread starts or is going to
>>> publish a NULL value when the sampler thread is about to be deleted, I
>>> figured it was a good idea for the setter to not proceed from either of
>>> the set points until the value was published and observable by the 
>>> threads
>>> that might be doing a get.
>>>
>>> When publishing a non-NULL value:
>>>    - I don't want the setter to assume that a ThreadsList is safe
>>>      and start using it while a different thread processing the
>>>      to-be-deleted list thinks that ThreadsList is freeable.
>>>
>>> When publishing a NULL value:
>>>    - I don't want the setter thread to delete itself while another
>>>      thread thinks that the generic tracing sampler thread is still
>>>      available and query-able.
>>>
>>> Please let me know if I'm being overly cautious.
>>
>> Very hard to tell. But what you describe seems specific to the 
>> semantics of ThreadSMR not to a general setting/getting a "sampler 
>> thread" reference, or even the general semantics of threads_do().
>>
>> The safety of publishing a NULL is also highly dependent on the 
>> overall way in which the sampler thread interacts with ThreadSMR - 
>> something which is completely invisible in this webrev as we don't see 
>> that thread.
> 
> Right, we're having that discussion over on the closed thread... :-)
> I'm wishing that "JEP 328: Flight Recorder" was done... :-)

Ye me too :)

> 
>> I'm wondering if the acquire/release-fence semantics should be visible 
>> in the get/set method names for clarity , as we do elsewhere?
> 
> What I have right now is working and I haven't seen any test failures
> since I added the OrderAccess calls. I'd like to go with that and take
> up any further tweaks with Markus and Erik G. as part of their work...

Renaming the methods wouldn't affect your testing, but if we're going to 
continue tweaking in this area then it can wait.

Thanks,
David

> Dan
> 
>>
>> Thanks,
>> David
>>
>>> Dan
>>>
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 3/04/2018 1:11 AM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> This fix has been revised due to additional testing and due to
>>>>> feedback from David H.
>>>>>
>>>>> Here's the incremental webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.inc/ 
>>>>>
>>>>>
>>>>> And here's the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.full/ 
>>>>>
>>>>>
>>>>> My usual Thread-SMR stress testing on my Solaris-X64 is still running;
>>>>> so far there has been only one unrelated intermittent test failure. 
>>>>> The
>>>>> Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run is still in
>>>>> process...
>>>>>
>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 3/28/18 9:52 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>>
>>>>>> JDK-8200374 Add ThreadsSMRSupport::verify_hazard_pointer_scanned() 
>>>>>> to verify threads_do()
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>>
>>>>>> Here's the webrev URL:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/
>>>>>>
>>>>>> This is Erik O's improvement on the assertion added by the following
>>>>>> bug fix (with some minor tweaking done by me):
>>>>>>
>>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>>
>>>>>> Summary of the changes:
>>>>>>
>>>>>> - Replace the assertion that I added in JDK-8199813 with a closure
>>>>>>   based function that verifies the threads_do() contract depended
>>>>>>   on by Thread-SMR.
>>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify
>>>>>>   that the calling thread's hazard pointer is scanned by 
>>>>>> threads_do().
>>>>>>   The new function is called from both ThreadsSMRSupport::acquire_*()
>>>>>>   functions.
>>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>>   Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>>   can also be called by other threads_do() functions. Yes, the
>>>>>>   Threads::threads_do() contract is still to scan every thread in
>>>>>>   the system.
>>>>>> - Add hooks for a "tracing sampler thread" to be optionally scanned
>>>>>>   by Threads::non_java_threads_do().
>>>>>>
>>>>>> This fix has gone thru a couple of Mach5 builds-tier1, jdk-tier[1-3],
>>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour Thread-SMR
>>>>>> stress testing run on my Solaris-X64 server.
>>>>>>
>>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>>> JavaThread::verify_not_published() (which started this round of
>>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>>
>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>
>>>
> 


More information about the hotspot-runtime-dev mailing list