RFR(S): 8200374: Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify threads_do()

Daniel D. Daugherty daniel.daugherty at oracle.com
Thu Apr 5 19:03:35 UTC 2018


Thanks for the review Robbin!

Dan


On 4/5/18 2:56 PM, Robbin Ehn wrote:
> Hi Dan,
>
> On 2018-04-04 15:07, Daniel D. Daugherty wrote:
>>
>> In any case, after switching back to release_store_fence(), I have
>> not seen a failure... yet... Mach5 is done, but my Solaris-X64
>> Thread-SMR stress run is still going...
>
> Since event thread is an issue today without this well needed extra 
> verification, ship this!
> As David already pointed out the fence should not be needed, but we 
> will look into event thread outside this enhancement.
>
> Thanks for adding this!
>
> /Robbin
>
>>
>> Dan
>>
>>
>>>
>>> David
>>>
>>>> Don't know if Mach5 is still rocking, but I'm about to find out...
>>>>
>>>> Dan
>>>>
>>>>
>>>>>
>>>>> Good thing Mach5 is rocking today!
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 4/3/18 5:45 PM, Daniel D. Daugherty wrote:
>>>>>> On 4/3/18 5:39 PM, David Holmes wrote:
>>>>>>> On 4/04/2018 7:25 AM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> This fix has been revised due to feedback from David H. and 
>>>>>>>> Erik O.
>>>>>>>>
>>>>>>>> Here's the incremental webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/2_for_jdk_hs_open.inc/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> And here's the full webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/2_for_jdk_hs_open.full/ 
>>>>>>>>
>>>>>>>>
>>>>>>>> Summary of the changes relative to 'cr1':
>>>>>>>>
>>>>>>>> - Rename TracingExport::get_sampler_thread() to 
>>>>>>>> sampler_thread_acquire()
>>>>>>>>    and TracingExport::set_sampler_thread() to 
>>>>>>>> set_sampler_thread_with_lock().
>>>>>>>> - set_sampler_thread_with_lock() uses Threads_lock to avoid 
>>>>>>>> conflicts with
>>>>>>>>    Thread-SMR scans.
>>>>>>>
>>>>>>> Sorry Dan but this is now mixing lock-free and locked in a 
>>>>>>> "broken" way.
>>>>>>
>>>>>> Yup... Little voice in the back of my head was telling me
>>>>>> that... I didn't listen... I'll switch it to:
>>>>>>
>>>>>> OrderAccess::release_store(&_sampler_thread, thread);
>>>>>>
>>>>>> and retest... :-)
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>>> A load_acquire is supposed to always be there to match a 
>>>>>>> release_store - but we don't have a release_store because we 
>>>>>>> hold the lock when setting the field. The lock is being used to 
>>>>>>> serialize with the ThreadSMR scans. If we need an acquire it is 
>>>>>>> because we want to ensure that we see stores prior to the 
>>>>>>> setting of the sampler_thread when we read the sampler_thread. 
>>>>>>> If that is the case then we still need the release_store because 
>>>>>>> we may read the new sampler_thread value as soon as it is set, 
>>>>>>> but before the unlock occurs - otherwise prior stores need not 
>>>>>>> be visible. If it is not the case that we need to see prior 
>>>>>>> stores then we don't need the acquire.
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> The Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run has 
>>>>>>>> finished
>>>>>>>> and there are no test failures related to these changes.
>>>>>>>>
>>>>>>>> My Solaris-X64 server is finishing the third round of 
>>>>>>>> Thread-SMR stress
>>>>>>>> testing on the 'cr1' version. I'll start a round of stress 
>>>>>>>> testing for
>>>>>>>> this version ('cr2') as soon as I can.
>>>>>>>>
>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> On 4/2/18 11:11 AM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> This fix has been revised due to additional testing and due to
>>>>>>>>> feedback from David H.
>>>>>>>>>
>>>>>>>>> Here's the incremental webrev:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.inc/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And here's the full webrev:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/1_for_jdk_hs_open.full/ 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> My usual Thread-SMR stress testing on my Solaris-X64 is still 
>>>>>>>>> running;
>>>>>>>>> so far there has been only one unrelated intermittent test 
>>>>>>>>> failure. The
>>>>>>>>> Mach5 builds-tier1, jdk-tier[1-3], and hs-tier[1-3] run is 
>>>>>>>>> still in
>>>>>>>>> process...
>>>>>>>>>
>>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/28/18 9:52 PM, Daniel D. Daugherty wrote:
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I have a (mostly) small enhancement for Thread-SMR:
>>>>>>>>>>
>>>>>>>>>> JDK-8200374 Add 
>>>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned() to verify 
>>>>>>>>>> threads_do()
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8200374
>>>>>>>>>>
>>>>>>>>>> Here's the webrev URL:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8200374-webrev/0_for_jdk_hs_open/ 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is Erik O's improvement on the assertion added by the 
>>>>>>>>>> following
>>>>>>>>>> bug fix (with some minor tweaking done by me):
>>>>>>>>>>
>>>>>>>>>> JDK-8199813 SIGSEGV in ThreadsList::includes()
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8199813
>>>>>>>>>>
>>>>>>>>>> Summary of the changes:
>>>>>>>>>>
>>>>>>>>>> - Replace the assertion that I added in JDK-8199813 with a 
>>>>>>>>>> closure
>>>>>>>>>>   based function that verifies the threads_do() contract 
>>>>>>>>>> depended
>>>>>>>>>>   on by Thread-SMR.
>>>>>>>>>> - Add ThreadsSMRSupport::verify_hazard_pointer_scanned() to 
>>>>>>>>>> verify
>>>>>>>>>>   that the calling thread's hazard pointer is scanned by 
>>>>>>>>>> threads_do().
>>>>>>>>>>   The new function is called from both 
>>>>>>>>>> ThreadsSMRSupport::acquire_*()
>>>>>>>>>>   functions.
>>>>>>>>>> - Refactor the non-JavaThread part of Threads::threads_do() into
>>>>>>>>>>   Threads::non_java_threads_do() so that the non-JavaThread part
>>>>>>>>>>   can also be called by other threads_do() functions. Yes, the
>>>>>>>>>>   Threads::threads_do() contract is still to scan every 
>>>>>>>>>> thread in
>>>>>>>>>>   the system.
>>>>>>>>>> - Add hooks for a "tracing sampler thread" to be optionally 
>>>>>>>>>> scanned
>>>>>>>>>>   by Threads::non_java_threads_do().
>>>>>>>>>>
>>>>>>>>>> This fix has gone thru a couple of Mach5 builds-tier1, 
>>>>>>>>>> jdk-tier[1-3],
>>>>>>>>>> and hs-tier[1-3] runs. I've also started my usual 24+ hour 
>>>>>>>>>> Thread-SMR
>>>>>>>>>> stress testing run on my Solaris-X64 server.
>>>>>>>>>>
>>>>>>>>>> Just to be extra sure, I backed out the fix from JDK-8199813 to
>>>>>>>>>> JavaThread::verify_not_published() (which started this round of
>>>>>>>>>> Thread-SMR fixes) and we catch the issue in the new function
>>>>>>>>>> ThreadsSMRSupport::verify_hazard_pointer_scanned().
>>>>>>>>>>
>>>>>>>>>> Thanks, in advance, for any comments, suggestions, or feedback.
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>



More information about the hotspot-runtime-dev mailing list