RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()

Robbin Ehn robbin.ehn at oracle.com
Thu Apr 5 18:56:37 UTC 2018


Hi Dan,

On 2018-04-04 15:07, Daniel D. Daugherty wrote:
> 
> In any case, after switching back to release_store_fence(), I have
> not seen a failure... yet... Mach5 is done, but my Solaris-X64
> Thread-SMR stress run is still going...

Since event thread is an issue today without this well needed extra verification, ship this!
As David already pointed out the fence should not be needed, but we will look into event thread outside this enhancement.

Thanks for adding this!

/Robbin

> 
> Dan
> 
> 
>>
>> David
>>
>>> Don't know if Mach5 is still rocking, but I'm about to find out...
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Good thing Mach5 is rocking today!
>>>>
>>>> Dan
>>>>
>>>>
>>>> On 4/3/18 5:45 PM, Daniel D. Daugherty wrote:
>>>>> On 4/3/18 5:39 PM, David Holmes wrote:
>>>>>> On 4/04/2018 7:25 AM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> This fix has been revised due to feedback from David H. and Erik O.
>>>>>>>
>>>>>>> Here's the incremental webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/2_for_jdk_hs_open.inc/
>>>>>>>
>>>>>>> And here's the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/2_for_jdk_hs_open.full/
>>>>>>>
>>>>>>> Summary of the changes relative to 'cr1':
>>>>>>>
>>>>>>> - Rename TracingExport::get_sampler_thread() to sampler_thread_acquire()
>>>>>>>    and TracingExport::set_sampler_thread() to set_sampler_thread_with_lock().
>>>>>>> - set_sampler_thread_with_lock() uses Threads_lock to avoid conflicts with
>>>>>>>    Thread-SMR scans.
>>>>>>
>>>>>> Sorry Dan but this is now mixing lock-free and locked in a "broken" way.
>>>>>
>>>>> Yup... Little voice in the back of my head was telling me
>>>>> that... I didn't listen... I'll switch it to:
>>>>>
>>>>> OrderAccess::release_store(&_sampler_thread, thread);
>>>>>
>>>>> and retest... :-)
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>>> A load_acquire is supposed to always be there to match a release_store - but we don't have a release_store because we hold the lock when setting the field. The lock is being used to serialize with the ThreadSMR scans. If we need an acquire it is because we want to ensure that we see stores prior to the setting of the sampler_thread when we read the sampler_thread. If that is the case then we still need the release_store because we may read the new sampler_thread value as soon as it is set, but before the unlock occurs - otherwise prior stores need not be visible. If it is not the case 
>>>>>> that we need to see prior stores then we don't need the acquire.
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> The Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run has finished
>>>>>>> and there are no test failures related to these changes.
>>>>>>>
>>>>>>> My Solaris-X64 server is finishing the third round of Thread-SMR stress
>>>>>>> testing on the 'cr1' version. I'll start a round of stress testing for
>>>>>>> this version ('cr2') as soon as I can.
>>>>>>>
>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> On 4/2/18 11:11 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> This fix has been revised due to additional testing and due to
>>>>>>>> feedback from David H.
>>>>>>>>
>>>>>>>> Here's the incremental webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.inc/
>>>>>>>>
>>>>>>>> And here's the full webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.full/
>>>>>>>>
>>>>>>>> My usual Thread-SMR stress testing on my Solaris-X64 is still running;
>>>>>>>> so far there has been only one unrelated intermittent test failure. The
>>>>>>>> Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run is still in
>>>>>>>> process...
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/28/18 9:52 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>>>>>
>>>>>>>>> JDK-8200374 Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>>>>>
>>>>>>>>> Here's the webrev URL:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/
>>>>>>>>>
>>>>>>>>> This is Erik O's improvement on the assertion added by the following
>>>>>>>>> bug fix (with some minor tweaking done by me):
>>>>>>>>>
>>>>>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>>>>>
>>>>>>>>> Summary of the changes:
>>>>>>>>>
>>>>>>>>> - Replace the assertion that I added in JDK-8199813 with a closure
>>>>>>>>>   based function that verifies the threads_do() contract depended
>>>>>>>>>   on by Thread-SMR.
>>>>>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify
>>>>>>>>>   that the calling thread's hazard pointer is scanned by threads_do().
>>>>>>>>>   The new function is called from both ThreadsSMRSupport::acquire_*()
>>>>>>>>>   functions.
>>>>>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>>>>>   Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>>>>>   can also be called by other threads_do() functions. Yes, the
>>>>>>>>>   Threads::threads_do() contract is still to scan every thread in
>>>>>>>>>   the system.
>>>>>>>>> - Add hooks for a "tracing sampler thread" to be optionally scanned
>>>>>>>>>   by Threads::non_java_threads_do().
>>>>>>>>>
>>>>>>>>> This fix has gone thru a couple of Mach5 builds-tier1, jdk-tier[1-3],
>>>>>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour Thread-SMR
>>>>>>>>> stress testing run on my Solaris-X64 server.
>>>>>>>>>
>>>>>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>>>>>> JavaThread::verify_not_published() (which started this round of
>>>>>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
> 


More information about the hotspot-runtime-dev mailing list