RFR(XL): 8167108 - SMR and JavaThread Lifecycle
Daniel D. Daugherty
daniel.daugherty at oracle.com
Tue Nov 21 16:28:56 UTC 2017
Hi Coleen!
Thanks for making time to review the Thread-SMR stuff again!!
I have added back the other three OpenJDK aliases... This review is
being done on _four_ different OpenJDK aliases.
As always, replies are embedded below...
On 11/20/17 3:12 PM, coleen.phillimore at oracle.com wrote:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/os/linux/os_linux.cpp.frames.html
>
>
> I read David's comments about the threads list iterator, and I was
> going to say that it can be cleaned up later, as the bulk of the
> change is the SMR part but this looks truly bizarre. It looks like
> it shouldn't compile because 'jt' isn't in scope.
>
> Why isn't this the syntax:
>
> JavaThreadIteratorWithHandle jtiwh;
> for (JavaThread* jt = jtiwh.first(); jt != NULL; jt = jtiwh.next()) {
> }
>
> Which would do the obvious thing without anyone having to squint at
> the code.
See my reply to David's review for the more detailed answer.
For the above syntax, we would need braces to limit the scope of the
'jtiwh' variable. With Stefan's propsal, you get limited scope on
'jtiwh' for "free".
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.hpp.html
>
>
> As a hater of acronmys, can this file use "Safe Memory Reclaimation"
I'm not sure what you mean by this. Do you mean rename the files?
threadSMR.hpp -> threadSafeMemoryReclaimation.hpp
threadSMR.cpp -> threadSafeMemoryReclaimation.cpp
> and briefly describe the concept in the beginning of the header file,
> so one knows why it's called threadSMR.hpp?
And then this part of the sentence kind of indicates that you would be
okay with the threadSMR.{c,h}pp names if a comment was added to the
header file.
Please clarify.
> It doesn't need to be long and include why Threads list needs this
> Sometimes we tell new people that the hotspot documentation is in the
> header files.
Yup. When I migrated stuff from thread.hpp and thread.cpp to threadSMR.hpp
and threadSMR.cpp, I should have written a header comment...
I did update a comment in thread.cpp based on Robin W's code review:
> > src/hotspot/share/runtime/thread.cpp
> >
> > 3432 // operations over all threads. It is protected by its own Mutex
> > 3433 // lock, which is also used in other contexts to protect thread
> >
> > Should this comment perhaps be revised to mention SMR?
>
> It definitely needs some updating... Here's a stab at it:
>
> // The Threads class links together all active threads, and provides
> // operations over all threads. It is protected by the Threads_lock,
> // which is also used in other global contexts like safepointing.
> // ThreadsListHandles are used to safely perform operations on one
> // or more threads without the risk of the thread exiting during the
> // operation.
> //
> // Note: The Threads_lock is currently more widely used than we
> // would like. We are actively migrating Threads_lock uses to other
> // mechanisms in order to reduce Threads_lock contention.
I'll take a look at adding a header comment to threadSMR.hpp.
> 186 JavaThreadIteratorWithHandle() : _tlh(), _index(0) {}
>
> This _tlh() call should not be necessary. The compiler should
> generate this for you in the constructor.
Deleted.
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.cpp.html
>
>
> 32 ThreadsList::ThreadsList(int entries) : _length(entries),
> _threads(NEW_C_HEAP_ARRAY(JavaThread*, entries + 1, mtGC)),
> _next_list(NULL) {
>
> Seems like it should be mtThread rather than mtGC.
Fixed. Definitely an artifact of Erik's original prototype when he
extracted Thread-SMR from his GC work... Thanks for catching it.
> Should
>
> 62 if (EnableThreadSMRStatistics) {
>
> really be UL, ie: if (log_is_enabled(Info, thread, smr, statistics)) ?
EnableThreadSMRStatistics is used in more places than UL code.
We use it in Thread::print_*() stuff to control output of
Thread-SMR statistics info in thread dumps and hs_err_pid file
generation.
Currently thread dump and hs_err_pid file output is not generated
using UL (and probably can't be?) so we need an option to control
the Thread-SMR statistics stuff in all places.
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/test/hotspot/jtreg/runtime/ErrorHandling/NestedThreadsListHandleInErrorHandlingTest.java.html
>
>
> Can you use for these tests instead (there were a couple):
>
> *@requires (vm.debug == true)*
The test I cloned had this in it:
if (!Platform.isDebugBuild()) {
// -XX:ErrorHandlerTest=N option requires debug bits.
return;
}
and you would like me to switch to the newer mechanism?
I have updated the following tests:
test/hotspot/jtreg/runtime/ErrorHandling/ErrorHandler.java
test/hotspot/jtreg/runtime/ErrorHandling/NestedThreadsListHandleInErrorHandlingTest.java
test/hotspot/jtreg/runtime/ErrorHandling/ThreadsListHandleInErrorHandlingTest.java
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/thread.cpp.udiff.html
>
>
> +// thread, has been added the Threads list, the system is not at a
>
> Has "not" been added to the Threads list? missing "not"?
Nope. If the JavaThread has been added to the Threads list
and it is not protected, then it is dangling. In other words,
a published JavaThread (on the Threads list) has to be protected
by either the system being at a safepoint or the JavaThread has
to be on some threads's ThreadsList.
>
> + return (unsigned int)(((uint32_t)(uintptr_t)s1) * 2654435761u);
>
> Can you add a comment about where this number came from?
I'll have to get that from Erik...
> I can't find the caller of threads_do_smr.
I'm guessing that's used by the GC code that depends on Thread-SMR...
> If these functions xchg_smr_thread_list, get_smr_java_thread_list,
> inc_smr_deleted_thread_count are only used by thread.cpp, I think they
> should go in that file and not in the .inline.hpp file to be included
> and possibly called by other files. I think they're private to class
> Threads.
I have a vague memory that some of the compilers don't do inlining when
an "inline" function is in a .cpp. I believe we want these functions
to be inlined for performance reasons. Erik should probably chime in
here.
> I don't have any in-depth comments. This looks really good from my
> day of reading it.
Thanks for taking the time to review it again!
Dan
>
> Thanks,
> Coleen
>
> On 11/18/17 8:49 PM, Daniel D. Daugherty wrote:
>
>> Greetings,
>>
>> Testing of the last round of changes revealed a hang in one of the new
>> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
>> added another TLH test for good measure.
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>
>> Here is the updated delta webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>
>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>> no unexpected failures.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Robbin rebased the project last night/this morning to merge with Thread
>>> Local Handshakes (TLH) and also picked up additional changesets up
>>> thru:
>>>
>>>> Changeset: fa736014cf28
>>>> Author: cjplummer
>>>> Date: 2017-11-14 18:08 -0800
>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>
>>>> 8191049: Add alternate version of pns() that is callable from
>>>> within hotspot source
>>>> Summary: added pns2() to debug.cpp
>>>> Reviewed-by: stuefe, gthornbr
>>>
>>> This is the first time we've rebased the project to bits that are this
>>> fresh (< 12 hours old at rebase time). We've done this because we think
>>> we're done with this project and are in the final review-change-retest
>>> cycle(s)... In other words, we're not planning on making any more major
>>> changes for this project... :-)
>>>
>>> *** Time for code reviewers to chime in on this thread! ***
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>
>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>
>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>> (and the new baseline also)...
>>>
>>> We're expecting this round to be a little noisier than usual because
>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>> (Yes, we're playing chase-the-repo...)
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>
>>>> Unlike the previous rebase, there were no changes required to the
>>>> open code to get the rebased bits to build so there is no delta
>>>> webrev.
>>>>
>>>> These bits have been run through JPRT and I've submitted the usual
>>>> Mach5 tier[1-5] test run...
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>
>>>>> Here are the updated webrevs:
>>>>>
>>>>> Here's the mq comment for the change:
>>>>>
>>>>> Rebase to 2017.10.25 PIT snapshot.
>>>>>
>>>>> Here is the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>
>>>>> And here is the delta webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>
>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the
>>>>> holiday
>>>>> weekend. Didn't see any issues in a quick look. Going to take a
>>>>> closer
>>>>> look today.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> Resolving one of the code review comments (from both Stefan K and
>>>>>> Coleen)
>>>>>> on jdk10-04-full required quite a few changes so it is being done
>>>>>> as a
>>>>>> standalone patch and corresponding webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>> stefank, coleenp CR - refactor most JavaThreadIterator usage to
>>>>>> use
>>>>>> JavaThreadIteratorWithHandle.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>
>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>
>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>
>>>>>>> Here's a PDF for the internal wiki that we've been using to
>>>>>>> describe
>>>>>>> and track the work on this project:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf
>>>>>>>
>>>>>>>
>>>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>>>> quotes
>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>> have a
>>>>>>> solution for that problem yet.
>>>>>>>
>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>
>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>> tier[2-5]
>>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Daniel Daugherty
>>>>>>> Erik Osterlund
>>>>>>> Robbin Ehn
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
More information about the hotspot-gc-dev
mailing list