RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()

Daniel D. Daugherty daniel.daugherty at oracle.com
Tue Apr 3 12:40:36 UTC 2018


On 4/2/18 5:49 PM, David Holmes wrote:
> On 3/04/2018 7:08 AM, Daniel D. Daugherty wrote:
>> Hi David!
>>
>> Thanks for the quick review.
>>
>>
>> On 4/2/18 4:34 PM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Only query, that I overlooked before is:
>>>
>>>  void TracingExport::set_sampler_thread(Thread * thread) {
>>> -  _sampler_thread = thread;
>>> +  OrderAccess::release_store_fence(&_sampler_thread, thread);
>>>  }
>>>
>>> why a release_store_fence() rather than simple release_store()?
>>
>> I based that choice on this comment:
>>
>> src/hotspot/share/runtime/orderAccess.hpp:
>>
>> // Conventional usage is to issue a load_acquire for ordered loads.  Use
>> // release_store for ordered stores when you care only that prior stores
>> // are visible before the release_store, but don't care exactly when the
>> // store associated with the release_store becomes visible.  Use
>> // release_store_fence to update values like the thread state, where we
>> // don't want the current thread to continue until all our prior memory
>> // accesses (including the new thread state) are visible to other 
>> threads.
>> // This is equivalent to the volatile semantics of the Java Memory 
>> Model.
>>
>> I see the _sampler_thread field as similar to the thread state.
>>
>> Since the setter (a generic tracing sampler thread), is going to either
>> publish a non-NULL value when the sampler thread starts or is going to
>> publish a NULL value when the sampler thread is about to be deleted, I
>> figured it was a good idea for the setter to not proceed from either of
>> the set points until the value was published and observable by the 
>> threads
>> that might be doing a get.
>>
>> When publishing a non-NULL value:
>>    - I don't want the setter to assume that a ThreadsList is safe
>>      and start using it while a different thread processing the
>>      to-be-deleted list thinks that ThreadsList is freeable.
>>
>> When publishing a NULL value:
>>    - I don't want the setter thread to delete itself while another
>>      thread thinks that the generic tracing sampler thread is still
>>      available and query-able.
>>
>> Please let me know if I'm being overly cautious.
>
> Very hard to tell. But what you describe seems specific to the 
> semantics of ThreadSMR not to a general setting/getting a "sampler 
> thread" reference, or even the general semantics of threads_do().
>
> The safety of publishing a NULL is also highly dependent on the 
> overall way in which the sampler thread interacts with ThreadSMR - 
> something which is completely invisible in this webrev as we don't see 
> that thread.

Right, we're having that discussion over on the closed thread... :-)
I'm wishing that "JEP 328: Flight Recorder" was done... :-)


> I'm wondering if the acquire/release-fence semantics should be visible 
> in the get/set method names for clarity , as we do elsewhere?

What I have right now is working and I haven't seen any test failures
since I added the OrderAccess calls. I'd like to go with that and take
up any further tweaks with Markus and Erik G. as part of their work...

Dan

>
> Thanks,
> David
>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 3/04/2018 1:11 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> This fix has been revised due to additional testing and due to
>>>> feedback from David H.
>>>>
>>>> Here's the incremental webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.inc/ 
>>>>
>>>>
>>>> And here's the full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.full/ 
>>>>
>>>>
>>>> My usual Thread-SMR stress testing on my Solaris-X64 is still running;
>>>> so far there has been only one unrelated intermittent test failure. 
>>>> The
>>>> Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run is still in
>>>> process...
>>>>
>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 3/28/18 9:52 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>
>>>>> JDK-8200374 Add ThreadsSMRSupport::verify_hazard_pointer_scanned() 
>>>>> to verify threads_do()
>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>
>>>>> Here's the webrev URL:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/
>>>>>
>>>>> This is Erik O's improvement on the assertion added by the following
>>>>> bug fix (with some minor tweaking done by me):
>>>>>
>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>
>>>>> Summary of the changes:
>>>>>
>>>>> - Replace the assertion that I added in JDK-8199813 with a closure
>>>>>   based function that verifies the threads_do() contract depended
>>>>>   on by Thread-SMR.
>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify
>>>>>   that the calling thread's hazard pointer is scanned by 
>>>>> threads_do().
>>>>>   The new function is called from both ThreadsSMRSupport::acquire_*()
>>>>>   functions.
>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>   Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>   can also be called by other threads_do() functions. Yes, the
>>>>>   Threads::threads_do() contract is still to scan every thread in
>>>>>   the system.
>>>>> - Add hooks for a "tracing sampler thread" to be optionally scanned
>>>>>   by Threads::non_java_threads_do().
>>>>>
>>>>> This fix has gone thru a couple of Mach5 builds-tier1, jdk-tier[1-3],
>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour Thread-SMR
>>>>> stress testing run on my Solaris-X64 server.
>>>>>
>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>> JavaThread::verify_not_published() (which started this round of
>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>
>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>
>>



More information about the hotspot-runtime-dev mailing list