RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()
Daniel D. Daugherty
daniel.daugherty at oracle.com
Tue Apr 3 13:33:13 UTC 2018
On 4/3/18 9:31 AM, David Holmes wrote:
> On 3/04/2018 10:40 PM, Daniel D. Daugherty wrote:
>> On 4/2/18 5:49 PM, David Holmes wrote:
>>> On 3/04/2018 7:08 AM, Daniel D. Daugherty wrote:
>>>> Hi David!
>>>>
>>>> Thanks for the quick review.
>>>>
>>>>
>>>> On 4/2/18 4:34 PM, David Holmes wrote:
>>>>> Hi Dan,
>>>>>
>>>>> Only query, that I overlooked before is:
>>>>>
>>>>> void TracingExport::set_sampler_thread(Thread * thread) {
>>>>> - _sampler_thread = thread;
>>>>> + OrderAccess::release_store_fence(&_sampler_thread, thread);
>>>>> }
>>>>>
>>>>> why a release_store_fence() rather than simple release_store()?
>>>>
>>>> I based that choice on this comment:
>>>>
>>>> src/hotspot/share/runtime/orderAccess.hpp:
>>>>
>>>> // Conventional usage is to issue a load_acquire for ordered
>>>> loads. Use
>>>> // release_store for ordered stores when you care only that prior
>>>> stores
>>>> // are visible before the release_store, but don't care exactly
>>>> when the
>>>> // store associated with the release_store becomes visible. Use
>>>> // release_store_fence to update values like the thread state,
>>>> where we
>>>> // don't want the current thread to continue until all our prior
>>>> memory
>>>> // accesses (including the new thread state) are visible to other
>>>> threads.
>>>> // This is equivalent to the volatile semantics of the Java Memory
>>>> Model.
>>>>
>>>> I see the _sampler_thread field as similar to the thread state.
>>>>
>>>> Since the setter (a generic tracing sampler thread), is going to
>>>> either
>>>> publish a non-NULL value when the sampler thread starts or is going to
>>>> publish a NULL value when the sampler thread is about to be deleted, I
>>>> figured it was a good idea for the setter to not proceed from
>>>> either of
>>>> the set points until the value was published and observable by the
>>>> threads
>>>> that might be doing a get.
>>>>
>>>> When publishing a non-NULL value:
>>>> - I don't want the setter to assume that a ThreadsList is safe
>>>> and start using it while a different thread processing the
>>>> to-be-deleted list thinks that ThreadsList is freeable.
>>>>
>>>> When publishing a NULL value:
>>>> - I don't want the setter thread to delete itself while another
>>>> thread thinks that the generic tracing sampler thread is still
>>>> available and query-able.
>>>>
>>>> Please let me know if I'm being overly cautious.
>>>
>>> Very hard to tell. But what you describe seems specific to the
>>> semantics of ThreadSMR not to a general setting/getting a "sampler
>>> thread" reference, or even the general semantics of threads_do().
>>>
>>> The safety of publishing a NULL is also highly dependent on the
>>> overall way in which the sampler thread interacts with ThreadSMR -
>>> something which is completely invisible in this webrev as we don't
>>> see that thread.
>>
>> Right, we're having that discussion over on the closed thread... :-)
>> I'm wishing that "JEP 328: Flight Recorder" was done... :-)
>
> Ye me too :)
>
>>
>>> I'm wondering if the acquire/release-fence semantics should be
>>> visible in the get/set method names for clarity , as we do elsewhere?
>>
>> What I have right now is working and I haven't seen any test failures
>> since I added the OrderAccess calls. I'd like to go with that and take
>> up any further tweaks with Markus and Erik G. as part of their work...
>
> Renaming the methods wouldn't affect your testing, but if we're going
> to continue tweaking in this area then it can wait.
Well, at this point, I'm going to try out Erik's grab the Threads_lock
idea so since I'm changing code...
What names would you like? Also, do you want me to switch to
release_store() for this next round of testing?
Dan
>
> Thanks,
> David
>
>> Dan
>>
>>>
>>> Thanks,
>>> David
>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 3/04/2018 1:11 AM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> This fix has been revised due to additional testing and due to
>>>>>> feedback from David H.
>>>>>>
>>>>>> Here's the incremental webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.inc/
>>>>>>
>>>>>>
>>>>>> And here's the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.full/
>>>>>>
>>>>>>
>>>>>> My usual Thread-SMR stress testing on my Solaris-X64 is still
>>>>>> running;
>>>>>> so far there has been only one unrelated intermittent test
>>>>>> failure. The
>>>>>> Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run is still in
>>>>>> process...
>>>>>>
>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 3/28/18 9:52 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>>>
>>>>>>> JDK-8200374 Add
>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify
>>>>>>> threads_do()
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>>>
>>>>>>> Here's the webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/
>>>>>>>
>>>>>>>
>>>>>>> This is Erik O's improvement on the assertion added by the
>>>>>>> following
>>>>>>> bug fix (with some minor tweaking done by me):
>>>>>>>
>>>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>>>
>>>>>>> Summary of the changes:
>>>>>>>
>>>>>>> - Replace the assertion that I added in JDK-8199813 with a closure
>>>>>>> based function that verifies the threads_do() contract depended
>>>>>>> on by Thread-SMR.
>>>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify
>>>>>>> that the calling thread's hazard pointer is scanned by
>>>>>>> threads_do().
>>>>>>> The new function is called from both
>>>>>>> ThreadsSMRSupport::acquire_*()
>>>>>>> functions.
>>>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>>> Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>>> can also be called by other threads_do() functions. Yes, the
>>>>>>> Threads::threads_do() contract is still to scan every thread in
>>>>>>> the system.
>>>>>>> - Add hooks for a "tracing sampler thread" to be optionally scanned
>>>>>>> by Threads::non_java_threads_do().
>>>>>>>
>>>>>>> This fix has gone thru a couple of Mach5 builds-tier1,
>>>>>>> jdk-tier[1-3],
>>>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour
>>>>>>> Thread-SMR
>>>>>>> stress testing run on my Solaris-X64 server.
>>>>>>>
>>>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>>>> JavaThread::verify_not_published() (which started this round of
>>>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
More information about the hotspot-runtime-dev
mailing list