RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty daniel.daugherty at oracle.com
Fri Nov 24 18:13:20 UTC 2017


Greetings,

The Thread-SMR bits were pushed to jdk/hs late last night!

Mach5 Tier[1-5] on the exact bits that were pushed showed
no unexpected failures. I've looked at the jdk/hs CI pipeline
test results for my push and for the push before and after
mine and I don't see anything that worries me there either.

Obviously I'll be keeping an eye on testing for the next
several days, but so far everything looks good!!

Dan


On 11/22/17 9:16 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> I've made minor tweaks to the Thread-SMR project based on the remaining
> code review comments over the last couple of days. I've also rebased the
> project to jdk/hs bits as of mid-afternoon (my TZ) on 2017.11.22. I'm
> running baseline Mach5 Tier[1-5] testing and prototype Mach5 Tier[1-5]
> testing...
>
> Here's a delta webrev for the latest round of tweaks:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-11-delta
>
>
> Here's the proposed commit message:
>
> $ cat SMR_prototype_10/open/commit.txt
> 8167108: inconsistent handling of SR_lock can lead to crashes
> Summary: Add Thread Safe Memory Reclamation (Thread-SMR) mechanism.
> Reviewed-by: coleenp, dcubed, dholmes, eosterlund, gthornbr, kbarrett, 
> rehn, sspitsyn, stefank
> Contributed-by: daniel.daugherty at oracle.com, 
> erik.osterlund at oracle.com, robbin.ehn at oracle.com
>
> I've gone through 880+ emails in my archive for this project and added
> anyone that made a code review comment. Reviewers are not in my usual
> by-comment-posting order; it was way too hard to figure that out... So
> reviewers and contributors are simply in alpha-sort order...
>
> Here's the collection of follow-up bugs that I filed based on sifting
> through the email archive:
>
> 8191784 JavaThread::collect_counters() should stop using Threads_lock
> https://bugs.openjdk.java.net/browse/JDK-8191784
>
> 8191785 revoke_bias() needs a new assertion
> https://bugs.openjdk.java.net/browse/JDK-8191785
>
> 8191786 Thread-SMR hash table size should be dynamic
> https://bugs.openjdk.java.net/browse/JDK-8191786
>
> 8191787 move private inline functions from thread.inline.hpp -> 
> thread.cpp
> https://bugs.openjdk.java.net/browse/JDK-8191787
>
> 8191789 migrate more Thread-SMR stuff from thread.[ch]pp -> 
> threadSMR.[ch]pp
> https://bugs.openjdk.java.net/browse/JDK-8191789
>
> 8191798 redo nested ThreadsListHandle to drop Threads_lock
> https://bugs.openjdk.java.net/browse/JDK-8191789
>
> I have undoubtedly missed at least one future idea that someone had
> and for that my apologies...
>
> Thanks again to everyone who contributed to the Thread-SMR project!!
>
> Special thanks goes to Karen Kinnear, Kim Barrett and David Holmes for
> their rigorous review of the design that we documented on the wiki.
> Thanks for keeping us honest!
>
> I also have to thank my co-contributor Erik Österlund for starting the
> ball rolling on this crazy project with his presentation on Safe Memory
> Reclamation, for the initial prototype and for all your patches. Erik,
> hopefully we got far enough in your crazy vision... :-)
>
> Thanks to my co-contributor Robbin Ehn for proposing that we do early
> code reviews in the form of patches. Don't like something? Fix it with
> a patch and a new test if appropriate!! A pretty cool way to work that
> was new to me.
>
> Finally, many thanks to Jerry Thornbrugh for all the early code reviews,
> whiteboard chats and phone calls. Thanks for asking the right questions
> and pointing out some places where our logic was faulty. Also thanks for
> keeping me sane when I was literally on my own in Monument, CO.
>
> Dan
>
>
> On 11/21/17 7:12 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> *** We are wrapping up code review on this project so it is time ***
>> *** for the various code reviewers to chime in one last time. ***
>>
>> In this latest round, we had three different reviewers chime in so we're
>> doing delta webrevs for each of those resolutions:
>>
>> David H's resolutions:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.09,10.10.0-delta/ 
>>
>>
>>   mq comment for dholmes_CR:
>>     - Fix indents, trailing spaces and various typos.
>>     - Add descriptions for the '_cnt', '_max' and '_times" fields,
>>       add impl notes to document the type choices.
>>
>>   src/hotspot/share/runtime/globals.hpp change is white-space only so it
>>   is only visible via the file's patch link.
>>
>>
>> Robin W's resolutions:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10.0,10.10.1-delta/ 
>>
>>
>>   mq comment for robinw_CR:
>>     - Fix some inefficient code, update some comments, fix some indents,
>>       and add some 'const' specifiers.
>>
>>   src/hotspot/share/runtime/vm_operations.hpp change is white-space 
>> only so
>>   it is only visible via the file's patch link.
>>
>>
>> Coleen's resolutions:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10.1,10.10.2-delta/ 
>>
>>
>>   mq comment for coleenp_CR:
>>     - Change ThreadsList::_threads from 'mtGC' -> 'mtThread',
>>     - add header comment to threadSMR.hpp file,
>>     - cleanup JavaThreadIteratorWithHandle ctr,
>>     - make ErrorHandling more efficient.
>>
>>
>> Some folks prefer to see all of the delta webrevs together so here is
>> a delta webrev relative to jdk10-09-full:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.09,10.10.2-delta/ 
>>
>>
>>   src/hotspot/share/runtime/globals.hpp and
>>   src/hotspot/share/runtime/vm_operations.hpp changes are white-space 
>> only
>>   so they are only visible via the file's patch link.
>>
>>
>> And, of course, some folks prefer to see everything all at once so here
>> is the latest full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10-full/
>>
>>
>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing. The
>> prelim Mach5 tier1 (JPRT equivalent) on these bits passed just fine...
>>
>> The project is currently baselined on Jesper's 2017-11-17 PIT snapshot
>> so there will be some catching up to do before integration...
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/18/17 8:49 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Testing of the last round of changes revealed a hang in one of the new
>>> TLH tests. Robbin has fixed the hang, updated the existing TLH test, 
>>> and
>>> added another TLH test for good measure.
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>>
>>> Here is the updated delta webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>>
>>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>>> no unexpected failures.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Robbin rebased the project last night/this morning to merge with 
>>>> Thread
>>>> Local Handshakes (TLH) and also picked up additional changesets up 
>>>> thru:
>>>>
>>>>> Changeset: fa736014cf28
>>>>> Author:    cjplummer
>>>>> Date:      2017-11-14 18:08 -0800
>>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>>
>>>>> 8191049: Add alternate version of pns() that is callable from 
>>>>> within hotspot source
>>>>> Summary: added pns2() to debug.cpp
>>>>> Reviewed-by: stuefe, gthornbr
>>>>
>>>> This is the first time we've rebased the project to bits that are this
>>>> fresh (< 12 hours old at rebase time). We've done this because we 
>>>> think
>>>> we're done with this project and are in the final review-change-retest
>>>> cycle(s)... In other words, we're not planning on making any more 
>>>> major
>>>> changes for this project... :-)
>>>>
>>>> *** Time for code reviewers to chime in on this thread! ***
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>>
>>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>>
>>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>>> (and the new baseline also)...
>>>>
>>>> We're expecting this round to be a little noisier than usual because
>>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>>> (Yes, we're playing chase-the-repo...)
>>>>>
>>>>> Here is the updated full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>>
>>>>> Unlike the previous rebase, there were no changes required to the
>>>>> open code to get the rebased bits to build so there is no delta
>>>>> webrev.
>>>>>
>>>>> These bits have been run through JPRT and I've submitted the usual
>>>>> Mach5 tier[1-5] test run...
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>>
>>>>>> Here are the updated webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>>
>>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the 
>>>>>> holiday
>>>>>> weekend. Didn't see any issues in a quick look. Going to take a 
>>>>>> closer
>>>>>> look today.
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> Resolving one of the code review comments (from both Stefan K 
>>>>>>> and Coleen)
>>>>>>> on jdk10-04-full required quite a few changes so it is being 
>>>>>>> done as a
>>>>>>> standalone patch and corresponding webrevs:
>>>>>>>
>>>>>>> Here's the mq comment for the change:
>>>>>>>
>>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage 
>>>>>>> to use
>>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>>
>>>>>>> Here is the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>>
>>>>>>> And here is the delta webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Dan, Erik and Robbin
>>>>>>>
>>>>>>>
>>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>>
>>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>>
>>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>>
>>>>>>>> Here's a PDF for the internal wiki that we've been using to 
>>>>>>>> describe
>>>>>>>> and track the work on this project:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>>
>>>>>>>>
>>>>>>>> Dan has noticed that the indenting is wrong in some of the code 
>>>>>>>> quotes
>>>>>>>> in the PDF that are not present in the internal wiki. We don't 
>>>>>>>> have a
>>>>>>>> solution for that problem yet.
>>>>>>>>
>>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>>
>>>>>>>> This fix has been run through many rounds of JPRT and Mach5 
>>>>>>>> tier[2-5]
>>>>>>>> testing, additional stress testing on Dan's Solaris X64 server, 
>>>>>>>> and
>>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>>
>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>
>>>>>>>> Daniel Daugherty
>>>>>>>> Erik Osterlund
>>>>>>>> Robbin Ehn
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



More information about the hotspot-runtime-dev mailing list