RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()
David Holmes
david.holmes at oracle.com
Wed Apr 4 02:30:32 UTC 2018
On 4/04/2018 11:46 AM, Daniel D. Daugherty wrote:
> On 4/3/18 5:52 PM, Daniel D. Daugherty wrote:
>> The actual relative diff:
>>
>> $ hg diff
>> diff -r b1cf49ff8e1c src/hotspot/share/trace/tracingExport.cpp
>> --- a/src/hotspot/share/trace/tracingExport.cpp Tue Apr 03 16:57:57
>> 2018 -0400
>> +++ b/src/hotspot/share/trace/tracingExport.cpp Tue Apr 03 17:50:26
>> 2018 -0400
>> @@ -36,5 +36,6 @@
>> void TracingExport::set_sampler_thread_with_lock(Thread* thread) {
>> // Grab Threads_lock to avoid conflicts with Thread-SMR scans.
>> MutexLocker ml(Threads_lock);
>> - _sampler_thread = thread;
>> + // To match with the lock-free load_acquire():
>> + OrderAccess::release_store(&_sampler_thread, thread);
>> }
>>
>>
>> The annoying thing is that my plan was to switch from
>> OrderAccess::release_store_fence() to OrderAccess::release_store()
>> for this round of testing AND I talked myself into dropping the
>> release_store() entirely...
>
> Sigh... going to OrderAccess::release_store() has brought back
> a sampler thread race crash that stopped when I switched to
> OrderAccess::release_store_fence() from the straight up:
>
> _sampler_thread = thread;
>
> that I had in CR0.
>
> So it looks like I need the "release" to match up to the
> "acquire" (thanks for reminding me David) AND I also need
> the fence.
That doesn't sound right because the unlock should be giving you a fence
anyway.
David
> Don't know if Mach5 is still rocking, but I'm about to find out...
>
> Dan
>
>
>>
>> Good thing Mach5 is rocking today!
>>
>> Dan
>>
>>
>> On 4/3/18 5:45 PM, Daniel D. Daugherty wrote:
>>> On 4/3/18 5:39 PM, David Holmes wrote:
>>>> On 4/04/2018 7:25 AM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> This fix has been revised due to feedback from David H. and Erik O.
>>>>>
>>>>> Here's the incremental webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/2_for_jdk_hs_open.inc/
>>>>>
>>>>>
>>>>> And here's the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/2_for_jdk_hs_open.full/
>>>>>
>>>>>
>>>>> Summary of the changes relative to 'cr1':
>>>>>
>>>>> - Rename TracingExport::get_sampler_thread() to
>>>>> sampler_thread_acquire()
>>>>> and TracingExport::set_sampler_thread() to
>>>>> set_sampler_thread_with_lock().
>>>>> - set_sampler_thread_with_lock() uses Threads_lock to avoid
>>>>> conflicts with
>>>>> Thread-SMR scans.
>>>>
>>>> Sorry Dan but this is now mixing lock-free and locked in a "broken"
>>>> way.
>>>
>>> Yup... Little voice in the back of my head was telling me
>>> that... I didn't listen... I'll switch it to:
>>>
>>> OrderAccess::release_store(&_sampler_thread, thread);
>>>
>>> and retest... :-)
>>>
>>> Dan
>>>
>>>
>>>
>>>> A load_acquire is supposed to always be there to match a
>>>> release_store - but we don't have a release_store because we hold
>>>> the lock when setting the field. The lock is being used to serialize
>>>> with the ThreadSMR scans. If we need an acquire it is because we
>>>> want to ensure that we see stores prior to the setting of the
>>>> sampler_thread when we read the sampler_thread. If that is the case
>>>> then we still need the release_store because we may read the new
>>>> sampler_thread value as soon as it is set, but before the unlock
>>>> occurs - otherwise prior stores need not be visible. If it is not
>>>> the case that we need to see prior stores then we don't need the
>>>> acquire.
>>>>
>>>> David
>>>> -----
>>>>
>>>>> The Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run has
>>>>> finished
>>>>> and there are no test failures related to these changes.
>>>>>
>>>>> My Solaris-X64 server is finishing the third round of Thread-SMR
>>>>> stress
>>>>> testing on the 'cr1' version. I'll start a round of stress testing for
>>>>> this version ('cr2') as soon as I can.
>>>>>
>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>
>>>>> Dan
>>>>>
>>>>> On 4/2/18 11:11 AM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> This fix has been revised due to additional testing and due to
>>>>>> feedback from David H.
>>>>>>
>>>>>> Here's the incremental webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.inc/
>>>>>>
>>>>>>
>>>>>> And here's the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.full/
>>>>>>
>>>>>>
>>>>>> My usual Thread-SMR stress testing on my Solaris-X64 is still
>>>>>> running;
>>>>>> so far there has been only one unrelated intermittent test
>>>>>> failure. The
>>>>>> Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run is still in
>>>>>> process...
>>>>>>
>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On 3/28/18 9:52 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>>>
>>>>>>> JDK-8200374 Add
>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify
>>>>>>> threads_do()
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>>>
>>>>>>> Here's the webrev URL:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/
>>>>>>>
>>>>>>> This is Erik O's improvement on the assertion added by the following
>>>>>>> bug fix (with some minor tweaking done by me):
>>>>>>>
>>>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>>>
>>>>>>> Summary of the changes:
>>>>>>>
>>>>>>> - Replace the assertion that I added in JDK-8199813 with a closure
>>>>>>> based function that verifies the threads_do() contract depended
>>>>>>> on by Thread-SMR.
>>>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify
>>>>>>> that the calling thread's hazard pointer is scanned by
>>>>>>> threads_do().
>>>>>>> The new function is called from both
>>>>>>> ThreadsSMRSupport::acquire_*()
>>>>>>> functions.
>>>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>>> Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>>> can also be called by other threads_do() functions. Yes, the
>>>>>>> Threads::threads_do() contract is still to scan every thread in
>>>>>>> the system.
>>>>>>> - Add hooks for a "tracing sampler thread" to be optionally scanned
>>>>>>> by Threads::non_java_threads_do().
>>>>>>>
>>>>>>> This fix has gone thru a couple of Mach5 builds-tier1,
>>>>>>> jdk-tier[1-3],
>>>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour
>>>>>>> Thread-SMR
>>>>>>> stress testing run on my Solaris-X64 server.
>>>>>>>
>>>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>>>> JavaThread::verify_not_published() (which started this round of
>>>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>
More information about the hotspot-runtime-dev
mailing list