RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty daniel.daugherty at oracle.com
Tue Nov 21 01:50:29 UTC 2017


On 11/20/17 12:51 AM, David Holmes wrote:
> Hi Dan,
>
> Figured I'd better cast an eye over this again before it gets pushed :)

Thanks for reviewing again!!


> Only one thing (repeated many times) jumped out at me and apologies if 
> this has already been debated back and forth. I missed the exchange 
> where the for loop iteration was rewritten into this unusual form:
>
> for (JavaThreadIteratorWithHandle jtiwh; JavaThread *jt = 
> jtiwh.next(); ) {
>
> On first reading I expect most people would mistake the control 
> expression for the iteration-expression due to the next(). I didn't 
> even know the control expression could introduce a local variable! I 
> find this really hard to read stylistically and far too cute/clever 
> for its own good. It also violates the style rules regarding implicit 
> boolean checks.
>
> I know Stefan proposed this as he did not like the alternate (in a few 
> places):
>
> +  {
> +    ThreadsListHandle tlh;
> +    JavaThreadIterator jti(tlh.list());
> +    for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
> ...
> +  }
>
> but it seems to me the iterator code has evolved since then and only a 
> couple of places need a distinct TLH separate from the actual iterator 
> object. So I'm more in favour of the more classic for-loop, with the 
> iterator declared before the loop. Or even convert to while loops, as 
> this iterator seems more suited to that.

Actually both Coleen and Stefan had a problem with how much additional
code was added to support an iterator model. In some cases we went from
1-line to 3-lines and in some cases we went from 1-line to 5-lines. For
Coleen, she wanted 2 of the new lines compressed into 1 new line by using
an iterator with an embedded ThreadsListHandle. Stefan wanted us to try
and get back to 1-line where possible.

The advantages of the new style are:

- 1-line to 1-line in all but a few cases
- automatic limited scope of the embedded ThreadsListHandle so we were
   able to remove extra braces that were added earlier in most cases

The disadvantages of the new style are:

- it is an unusual for-loop style (in HotSpot)
- it breaks HotSpot's style rule about implied booleans


Stefan K is pretty adamant that the original iterator version where we
go from 1-line to 3-lines or 1-line to 5-lines is not acceptable for GC
code. The compiler guys haven't chimed in on this debate so I'm guessing
they don't have a strong opinion either way. One thing that we don't want
to do is have different styles for the different teams.

So I had to make a judgement call on whether I thought Runtime could live
with Stefan's idea. Originally I wasn't fond of it either and breaking the
implied boolean style rule is a problem for me (I post that comment a lot
in my code reviews). However, I have to say that I'm a lot happier about
the compactness of the code and the fact that limited ThreadsListHandle
scope comes for 'free' in most cases.

We're planning to keep the new iterator style for the following reasons:

- smaller change footprint
- consistent style used for all of the teams
- limited ThreadsListHandle scope comes for free in most cases

We're hoping that you can live with this decision (for now) and maybe
even grow to like it in the future. :-)


> Other than that just a couple of comments on variable type choices, 
> and a few typos spotted below.

Replies embedded below.


>
> Thanks,
> David
> ---
>
> src/hotspot/share/runtime/globals.hpp
>
> 2476   diagnostic(bool, EnableThreadSMRExtraValidityChecks, true, 
>         \
> 2477           "Enable Thread SMR extra validity checks") \
> 2478         \
> 2479   diagnostic(bool, EnableThreadSMRStatistics, true,         \
> 2480           "Enable Thread SMR Statistics")         \
>
> Indent of strings is off by  3 spaces.

Fixed.


> ---
>
> src/hotspot/share/runtime/handshake.cpp
>
>  140       // There is assumption in code that Threads_lock should be 
> lock
>  200       // There is assumption in code that Threads_lock should be 
> lock
>
> lock -> locked

Fixed.


> 146         // handshake_process_by_vmthread will check the semaphore 
> for us again
>
> Needs period at end.

Fixed.


> 148         // should be okey since the new thread will not have an 
> operation.
>
> okey -> okay

Fixed.


> 151         // We can't warn here is since the thread do 
> cancel_handshake after it have been removed
>
> I think:
>
> // We can't warn here since the thread does cancel_handshake after it 
> has been removed

Fixed.


> 152         // from ThreadsList. So we should just keep looping here 
> until while below return negative.
>
> from -> from the
>
> Needs period at end.

Fixed both.


>
>  204             // A new thread on the ThreadsList will not have an 
> operation.
>
> Change period to comma, and ...

Fixed.


> 205             // Hence is skipped in handshake_process_by_vmthread.
>
> Hence is -> hence it is

Fixed.


> ---
>
> src/hotspot/share/runtime/thread.cpp
>
> 1482     // dcubed - Looks like the Threads_lock is for stable access
> 1483     // to _jvmci_old_thread_counters and _jvmci_counters.
>
> Does this comment need revisiting?

We've been trying to decide what to do about this comment. We'll be
filing a follow up bug for the Compiler team to take another look at
how _jvmci_old_thread_counters and _jvmci_counters are protected.
Threads_lock is a big hammer and there should be a less expensive
solution. Once that bug gets filed, this comment can go away.


> 3451 volatile jint ...
>
> Why are you using jint for field types here? (Surprised Coleen hasn't 
> spotted them! ;-) ).

volatile jint         Threads::_smr_delete_notify = 0;
volatile jint         Threads::_smr_deleted_thread_cnt = 0;
volatile jint         Threads::_smr_deleted_thread_time_max = 0;
volatile jint         Threads::_smr_deleted_thread_times = 0;
:
volatile jint         Threads::_smr_tlh_cnt = 0;
volatile jint         Threads::_smr_tlh_time_max = 0;
volatile jint         Threads::_smr_tlh_times = 0;

_smr_delete_notify is a "flag" that indicates when an _smr_delete_lock
notify() operation is needed. It counts up and down and should be a fairly
small number.

_smr_deleted_thread_cnt counts the number of Threads that have been
deleted over the life of the VM. It's an always increasing number so
it's size depends on how long the VM has been running.

_smr_deleted_thread_time_max is the maximum time in millis it has
taken to delete a thread. This field was originally a jlong, but
IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
just been Atomic::add() that was not happy)

_smr_deleted_thread_times accumulates the time in millis that it
has taken to delete threads. It's an always increasing number so
it's size depends on how long the VM has been running. This field
was originally a jlong, but IIRC the Atomic::add() code was not
happy on ARM... (might have just been Atomic::cmpxchg() that was
not happy)

_smr_tlh_cnt counts the number of ThreadsListHandles that have been
deleted over the life of the VM. It's an always increasing number so
it's size depends on how long the VM has been running.

_smr_tlh_time_max is the maximum time in millis it has taken to
delete a ThreadsListHandle. This field was originally a jlong, but
IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
just been Atomic::add() that was not happy)

_smr_tlh_times  accumulates the time in millis that it has taken to
delete ThreadsListHandles. It's an always increasing number so
it's size depends on how long the VM has been running.  This field
was originally a jlong, but IIRC the Atomic::add() code was not
happy on ARM... (might have just been Atomic::cmpxchg() that was
not happy)


It just dawned on me that I'm not sure whether you think the 'jint'
fields should be larger or smaller or the same size but a different
type... :-)

The fact that I had to write so much to explain what these fields
are for and how they're used indicates I need more comments. See
below...


> 3456 long Threads::_smr_java_thread_list_alloc_cnt = 1;
> 3457 long                  Threads::_smr_java_thread_list_free_cnt = 0;
>
> long? If you really want 64-bit counters here you won't get them on 
> Windows with that declaration. If you really want variable sized 
> counters I suggest uintptr_t; otherwise uint64_t.

long                  Threads::_smr_java_thread_list_alloc_cnt = 1;
long                  Threads::_smr_java_thread_list_free_cnt = 0;

_smr_java_thread_list_alloc_cnt counts the number of ThreadsLists that
have been allocated over the life of the VM. It's an always increasing
number so it's size depends on how long the VM has been running and how
many Threads have been created and destroyed.

_smr_java_thread_list_free_cnt counts the number of ThreadsLists that
have been freed over the life of the VM. It's an always increasing
number so it's size depends on how long the VM has been running and how
many Threads have been created and destroyed.

I can't remember why we chose 'long' and I agree it's not a good choice
for Win*.


Okay so it dawns on me that we haven't written down a description for
the various new '_cnt', '_max' and '_times" fields so I'm adding these
comments to thread.hpp:

  private:
   // Safe Memory Reclamation (SMR) support:
   static Monitor*              _smr_delete_lock;
   // The '_cnt', '_max' and '_times" fields are enabled via
   // -XX:+EnableThreadSMRStatistics:
                                // # of parallel threads in 
_smr_delete_lock->wait().
   static uint                  _smr_delete_lock_wait_cnt;
                                // Max # of parallel threads in 
_smr_delete_lock->wait().
   static uint                  _smr_delete_lock_wait_max;
                                // Flag to indicate when an 
_smr_delete_lock->notify() is needed.
   static volatile uint         _smr_delete_notify;
                                // # of threads deleted over VM lifetime.
   static volatile uint         _smr_deleted_thread_cnt;
                                // Max time in millis to delete a thread.
   static volatile uint         _smr_deleted_thread_time_max;
                                // Cumulative time in millis to delete 
threads.
   static volatile uint         _smr_deleted_thread_times;
   static ThreadsList* volatile _smr_java_thread_list;
   static ThreadsList*          get_smr_java_thread_list();
   static ThreadsList* xchg_smr_java_thread_list(ThreadsList* new_list);
                                // # of ThreadsLists allocated over VM 
lifetime.
   static uint64_t              _smr_java_thread_list_alloc_cnt;
                                // # of ThreadsLists freed over VM lifetime.
   static uint64_t              _smr_java_thread_list_free_cnt;
                                // Max size ThreadsList allocated.
   static uint                  _smr_java_thread_list_max;
                                // Max # of nested ThreadsLists for a 
thread.
   static uint                  _smr_nested_thread_list_max;
                                // # of ThreadsListHandles deleted over 
VM lifetime.
   static volatile uint         _smr_tlh_cnt;
                                // Max time in millis to delete a 
ThreadsListHandle.
   static volatile jint         _smr_tlh_time_max;
                                // Cumulative time in millis to delete 
ThreadsListHandles.
   static volatile jint         _smr_tlh_times;
   static ThreadsList*          _smr_to_delete_list;
                                // # of parallel ThreadsLists on the 
to-delete list.
   static uint                  _smr_to_delete_list_cnt;
                                // Max # of parallel ThreadsLists on the 
to-delete list.
   static uint                  _smr_to_delete_list_max;


I've also gone through all the new '_cnt', '_max' and '_times" fields
in thread.cpp and added an impl note to explain the choice of type:

// Safe Memory Reclamation (SMR) support:
Monitor*              Threads::_smr_delete_lock =
                           new Monitor(Monitor::special, "smr_delete_lock",
                                       false /* allow_vm_block */,
Monitor::_safepoint_check_never);
// The '_cnt', '_max' and '_times" fields are enabled via
// -XX:+EnableThreadSMRStatistics:
                       // # of parallel threads in _smr_delete_lock->wait().
                       // Impl note: Hard to imagine > 64K waiting 
threads so
                       // this could be 16-bit, but there is no nice 16-bit
                       // _FORMAT support.
uint                  Threads::_smr_delete_lock_wait_cnt = 0;
                       // Max # of parallel threads in 
_smr_delete_lock->wait().
                       // Impl note: See _smr_delete_lock_wait_cnt note.
uint                  Threads::_smr_delete_lock_wait_max = 0;
                       // Flag to indicate when an 
_smr_delete_lock->notify() is needed.
                       // Impl note: See _smr_delete_lock_wait_cnt note.
volatile uint         Threads::_smr_delete_notify = 0;
                       // # of threads deleted over VM lifetime.
                       // Impl note: Atomically incremented over VM 
lifetime so
                       // use unsigned for more range. Unsigned 64-bit would
                       // be more future proof, but 64-bit atomic inc isn't
                       // available everywhere (or is it?).
volatile uint         Threads::_smr_deleted_thread_cnt = 0;
                       // Max time in millis to delete a thread.
                       // Impl note: 16-bit might be too small on an 
overloaded
                       // machine. Use unsigned since this is a time 
value. Set
                       // via Atomic::cmpxchg() in a loop for correctness.
volatile uint         Threads::_smr_deleted_thread_time_max = 0;
                       // Cumulative time in millis to delete threads.
                       // Impl note: Atomically added to over VM 
lifetime so use
                       // unsigned for more range. Unsigned 64-bit would 
be more
                       // future proof, but 64-bit atomic inc isn't 
available
                       // everywhere (or is it?).
volatile uint         Threads::_smr_deleted_thread_times = 0;
ThreadsList* volatile Threads::_smr_java_thread_list = new ThreadsList(0);
                       // # of ThreadsLists allocated over VM lifetime.
                       // Impl note: We allocate a new ThreadsList for every
                       // thread create and every thread delete so we need a
                       // bigger type than the _smr_deleted_thread_cnt 
field.
uint64_t              Threads::_smr_java_thread_list_alloc_cnt = 1;
                       // # of ThreadsLists freed over VM lifetime.
                       // Impl note: See _smr_java_thread_list_alloc_cnt 
note.
uint64_t              Threads::_smr_java_thread_list_free_cnt = 0;
                       // Max size ThreadsList allocated.
                       // Impl note: Max # of threads alive at one time 
should
                       // fit in unsigned 32-bit.
uint                  Threads::_smr_java_thread_list_max = 0;
                       // Max # of nested ThreadsLists for a thread.
                       // Impl note: Hard to imagine > 64K nested 
ThreadsLists
                       // so his could be 16-bit, but there is no nice 
16-bit
                       // _FORMAT support.
uint                  Threads::_smr_nested_thread_list_max = 0;
                       // # of ThreadsListHandles deleted over VM lifetime.
                       // Impl note: Atomically incremented over VM 
lifetime so
                       // use unsigned for more range. There will be fewer
                       // ThreadsListHandles than threads so unsigned 32-bit
                       // should be fine.
volatile uint         Threads::_smr_tlh_cnt = 0;
                       // Max time in millis to delete a ThreadsListHandle.
                       // Impl note: 16-bit might be too small on an 
overloaded
                       // machine. Use unsigned since this is a time 
value. Set
                       // via Atomic::cmpxchg() in a loop for correctness.
volatile uint         Threads::_smr_tlh_time_max = 0;
                       // Cumulative time in millis to delete 
ThreadsListHandles.
                       // Impl note: Atomically added to over VM 
lifetime so use
                       // unsigned for more range. Unsigned 64-bit would 
be more
                       // future proof, but 64-bit atomic inc isn't 
available
                       // everywhere (or is it?).
volatile uint         Threads::_smr_tlh_times = 0;
ThreadsList*          Threads::_smr_to_delete_list = NULL;
                       // # of parallel ThreadsLists on the to-delete list.
                       // Impl note: Hard to imagine > 64K ThreadsLists 
needing
                       // to be deleted so this could be 16-bit, but 
there is no
                       // nice 16-bit _FORMAT support.
uint                  Threads::_smr_to_delete_list_cnt = 0;
                       // Max # of parallel ThreadsLists on the 
to-delete list.
                       // Impl note: See _smr_to_delete_list_cnt note.
uint                  Threads::_smr_to_delete_list_max = 0;


Yikes! With the additional comments, the additions to thread.hpp and
thread.cpp grew by a bunch... I've done a test build build on my Mac
with these changes and I'm about to kick off a Mach5 tier1 job...

Dan



>
> ----
>
> On 19/11/2017 11:49 AM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> Testing of the last round of changes revealed a hang in one of the new
>> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
>> added another TLH test for good measure.
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>
>> Here is the updated delta webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>
>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>> no unexpected failures.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Robbin rebased the project last night/this morning to merge with Thread
>>> Local Handshakes (TLH) and also picked up additional changesets up 
>>> thru:
>>>
>>>> Changeset: fa736014cf28
>>>> Author:    cjplummer
>>>> Date:      2017-11-14 18:08 -0800
>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>
>>>> 8191049: Add alternate version of pns() that is callable from 
>>>> within hotspot source
>>>> Summary: added pns2() to debug.cpp
>>>> Reviewed-by: stuefe, gthornbr
>>>
>>> This is the first time we've rebased the project to bits that are this
>>> fresh (< 12 hours old at rebase time). We've done this because we think
>>> we're done with this project and are in the final review-change-retest
>>> cycle(s)... In other words, we're not planning on making any more major
>>> changes for this project... :-)
>>>
>>> *** Time for code reviewers to chime in on this thread! ***
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>
>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>
>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>> (and the new baseline also)...
>>>
>>> We're expecting this round to be a little noisier than usual because
>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>> (Yes, we're playing chase-the-repo...)
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>
>>>> Unlike the previous rebase, there were no changes required to the
>>>> open code to get the rebased bits to build so there is no delta
>>>> webrev.
>>>>
>>>> These bits have been run through JPRT and I've submitted the usual
>>>> Mach5 tier[1-5] test run...
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>
>>>>> Here are the updated webrevs:
>>>>>
>>>>> Here's the mq comment for the change:
>>>>>
>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>
>>>>> Here is the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>
>>>>> And here is the delta webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>
>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the 
>>>>> holiday
>>>>> weekend. Didn't see any issues in a quick look. Going to take a 
>>>>> closer
>>>>> look today.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> Resolving one of the code review comments (from both Stefan K and 
>>>>>> Coleen)
>>>>>> on jdk10-04-full required quite a few changes so it is being done 
>>>>>> as a
>>>>>> standalone patch and corresponding webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage to 
>>>>>> use
>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>
>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>
>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>
>>>>>>> Here's a PDF for the internal wiki that we've been using to 
>>>>>>> describe
>>>>>>> and track the work on this project:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>
>>>>>>>
>>>>>>> Dan has noticed that the indenting is wrong in some of the code 
>>>>>>> quotes
>>>>>>> in the PDF that are not present in the internal wiki. We don't 
>>>>>>> have a
>>>>>>> solution for that problem yet.
>>>>>>>
>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>
>>>>>>> This fix has been run through many rounds of JPRT and Mach5 
>>>>>>> tier[2-5]
>>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Daniel Daugherty
>>>>>>> Erik Osterlund
>>>>>>> Robbin Ehn
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>



More information about the hotspot-compiler-dev mailing list