RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()

Daniel D. Daugherty daniel.daugherty at oracle.com
Tue Apr 3 14:00:29 UTC 2018


On 4/3/18 9:52 AM, David Holmes wrote:
>
>
> On 3/04/2018 11:33 PM, Daniel D. Daugherty wrote:
>> On 4/3/18 9:31 AM, David Holmes wrote:
>>> On 3/04/2018 10:40 PM, Daniel D. Daugherty wrote:
>>>> On 4/2/18 5:49 PM, David Holmes wrote:
>>>>> On 3/04/2018 7:08 AM, Daniel D. Daugherty wrote:
>>>>>> Hi David!
>>>>>>
>>>>>> Thanks for the quick review.
>>>>>>
>>>>>>
>>>>>> On 4/2/18 4:34 PM, David Holmes wrote:
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> Only query, that I overlooked before is:
>>>>>>>
>>>>>>>  void TracingExport::set_sampler_thread(Thread * thread) {
>>>>>>> -  _sampler_thread = thread;
>>>>>>> + OrderAccess::release_store_fence(&_sampler_thread, thread);
>>>>>>>  }
>>>>>>>
>>>>>>> why a release_store_fence() rather than simple release_store()?
>>>>>>
>>>>>> I based that choice on this comment:
>>>>>>
>>>>>> src/hotspot/share/runtime/orderAccess.hpp:
>>>>>>
>>>>>> // Conventional usage is to issue a load_acquire for ordered 
>>>>>> loads.  Use
>>>>>> // release_store for ordered stores when you care only that prior 
>>>>>> stores
>>>>>> // are visible before the release_store, but don't care exactly 
>>>>>> when the
>>>>>> // store associated with the release_store becomes visible. Use
>>>>>> // release_store_fence to update values like the thread state, 
>>>>>> where we
>>>>>> // don't want the current thread to continue until all our prior 
>>>>>> memory
>>>>>> // accesses (including the new thread state) are visible to other 
>>>>>> threads.
>>>>>> // This is equivalent to the volatile semantics of the Java 
>>>>>> Memory Model.
>>>>>>
>>>>>> I see the _sampler_thread field as similar to the thread state.
>>>>>>
>>>>>> Since the setter (a generic tracing sampler thread), is going to 
>>>>>> either
>>>>>> publish a non-NULL value when the sampler thread starts or is 
>>>>>> going to
>>>>>> publish a NULL value when the sampler thread is about to be 
>>>>>> deleted, I
>>>>>> figured it was a good idea for the setter to not proceed from 
>>>>>> either of
>>>>>> the set points until the value was published and observable by 
>>>>>> the threads
>>>>>> that might be doing a get.
>>>>>>
>>>>>> When publishing a non-NULL value:
>>>>>>    - I don't want the setter to assume that a ThreadsList is safe
>>>>>>      and start using it while a different thread processing the
>>>>>>      to-be-deleted list thinks that ThreadsList is freeable.
>>>>>>
>>>>>> When publishing a NULL value:
>>>>>>    - I don't want the setter thread to delete itself while another
>>>>>>      thread thinks that the generic tracing sampler thread is still
>>>>>>      available and query-able.
>>>>>>
>>>>>> Please let me know if I'm being overly cautious.
>>>>>
>>>>> Very hard to tell. But what you describe seems specific to the 
>>>>> semantics of ThreadSMR not to a general setting/getting a "sampler 
>>>>> thread" reference, or even the general semantics of threads_do().
>>>>>
>>>>> The safety of publishing a NULL is also highly dependent on the 
>>>>> overall way in which the sampler thread interacts with ThreadSMR - 
>>>>> something which is completely invisible in this webrev as we don't 
>>>>> see that thread.
>>>>
>>>> Right, we're having that discussion over on the closed thread... :-)
>>>> I'm wishing that "JEP 328: Flight Recorder" was done... :-)
>>>
>>> Ye me too :)
>>>
>>>>
>>>>> I'm wondering if the acquire/release-fence semantics should be 
>>>>> visible in the get/set method names for clarity , as we do elsewhere?
>>>>
>>>> What I have right now is working and I haven't seen any test failures
>>>> since I added the OrderAccess calls. I'd like to go with that and take
>>>> up any further tweaks with Markus and Erik G. as part of their work...
>>>
>>> Renaming the methods wouldn't affect your testing, but if we're 
>>> going to continue tweaking in this area then it can wait.
>>
>> Well, at this point, I'm going to try out Erik's grab the Threads_lock
>> idea so since I'm changing code...
>>
>> What names would you like? 
>
> release_set_sampler_thread_fence(Thread* t)
> get_sampler_thread_acquire()
>
> would match existing conventions (though use of get_ is itself not 
> normal convention).

Thanks for the names. Will drop the 'get_' part.


>
>> Also, do you want me to switch to release_store() for this next round 
>> of testing?
>
> I can't answer that without detailed understanding of how it will be 
> used.

That's over in the 'closed' thread (for now). :-) There's a webrev
over there and the code in question has been quoted at least once... :-)

Dan


>
> David
>
>>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> David
>>>
>>>> Dan
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> On 3/04/2018 1:11 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> This fix has been revised due to additional testing and due to
>>>>>>>> feedback from David H.
>>>>>>>>
>>>>>>>> Here's the incremental webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> And here's the full webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.full/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> My usual Thread-SMR stress testing on my Solaris-X64 is still 
>>>>>>>> running;
>>>>>>>> so far there has been only one unrelated intermittent test 
>>>>>>>> failure. The
>>>>>>>> Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run is 
>>>>>>>> still in
>>>>>>>> process...
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/28/18 9:52 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>>>>>
>>>>>>>>> JDK-8200374 Add 
>>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify 
>>>>>>>>> threads_do()
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>>>>>
>>>>>>>>> Here's the webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is Erik O's improvement on the assertion added by the 
>>>>>>>>> following
>>>>>>>>> bug fix (with some minor tweaking done by me):
>>>>>>>>>
>>>>>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>>>>>
>>>>>>>>> Summary of the changes:
>>>>>>>>>
>>>>>>>>> - Replace the assertion that I added in JDK-8199813 with a 
>>>>>>>>> closure
>>>>>>>>>   based function that verifies the threads_do() contract depended
>>>>>>>>>   on by Thread-SMR.
>>>>>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to 
>>>>>>>>> verify
>>>>>>>>>   that the calling thread's hazard pointer is scanned by 
>>>>>>>>> threads_do().
>>>>>>>>>   The new function is called from both 
>>>>>>>>> ThreadsSMRSupport::acquire_*()
>>>>>>>>>   functions.
>>>>>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>>>>>   Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>>>>>   can also be called by other threads_do() functions. Yes, the
>>>>>>>>>   Threads::threads_do() contract is still to scan every thread in
>>>>>>>>>   the system.
>>>>>>>>> - Add hooks for a "tracing sampler thread" to be optionally 
>>>>>>>>> scanned
>>>>>>>>>   by Threads::non_java_threads_do().
>>>>>>>>>
>>>>>>>>> This fix has gone thru a couple of Mach5 builds-tier1, 
>>>>>>>>> jdk-tier[1-3],
>>>>>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour 
>>>>>>>>> Thread-SMR
>>>>>>>>> stress testing run on my Solaris-X64 server.
>>>>>>>>>
>>>>>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>>>>>> JavaThread::verify_not_published() (which started this round of
>>>>>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>



More information about the hotspot-runtime-dev mailing list