RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Wed Nov 8 17:49:43 UTC 2017

Greetings,

This code review from Stefan Karlsson was originally posted on an Oracle
internal alias that we use for discussing HotSpot SMR development issues.
That subject was: "Thread-SMR (8167108)(JDK10): CR round 0 changes". I did
not have time to address Stefan's review before I went on vacation and
Stefan graciously allowed me to defer his review to the OpenJDK review.

With Stefan's permission, I'm replying to his code review on the OpenJDK
aliases that we're using for the JDK-8167108 code review.


On 10/6/17 1:19 PM, Stefan Karlsson wrote:
> Hi Dan,
>
> Great that this is getting done!

Thanks! It has been an adventure for all three primary contributors...


> Erik already mentioned the file so I think that's on track, and that's 
> what I was most concerned about.
>
>
> I have not been involved in the review of this patch at all, but now 
> that I've been looking at it I have a few comments. I hope you don't 
> mind. I don't want them to block the open review request, but at the 
> same time I'd like to share my opinion and have it weighted in for the 
> future of this code.
>
> 1) ThreadsListHandle allows a safepoint to block and allow GCs to run 
> and restructure objects and metadata, iff the ThreadsListHandle is 
> nested. This forced me to review all usages of ThreadsListHandle in 
> great detail to visually verify that it didn't break the surrounding code.
>
> If ThreadsListHandle didn't ever block for safepoints, I wouldn't have 
> had to care about that aspect when reading and reviewing the code. 
> This also means that we in the future runs the risk of someone 
> accidentally adding a nested ThreadsListHandle somewhere where they 
> don't expect a safepoint to happen.

We included the following to automatically sanity check the placement
of the ThreadsListHandle:

src/hotspot/share/runtime/thread.cpp:

3596 ThreadsList *Threads::acquire_stable_list(Thread *self, bool 
is_ThreadsListSetter) {
3597   assert(self != NULL, "sanity check");
3598   // acquire_stable_list_nested_path() will grab the Threads_lock
3599   // so let's make sure the ThreadsListHandle is in a safe place.
3600   // ThreadsListSetter cannot make this check on this code path.
3601   debug_only(if (!is_ThreadsListSetter && StrictSafepointChecks) 
self->check_for_valid_safepoint_state(/* potential_vm_operation */ false);)

So each ThreadsListHandle location is checked for being a valid
safepoint state.

We tested this idea with our work on JvmtiEnv::GetThreadLocalStorage().
GetThreadLocalStorage() is called from the 'in native' state so we
cannot place a common ThreadsListHandle for all code paths because
it would fail the assertion when called with 'thread == NULL', i.e.,
the thread is operating on itself.

Update: Erik is going to explore a lock free solution for the
nesting algorithm. That will likely mean that a ThreadsListHandle
will not require placement in a safepoint safe location...


> If the lock protecting the threads list could be taken with 
> no_safepoint_check, then the described problem above would completely 
> vanish. Unfortunately, we can't acquire the Threads_lock with 
> no_safepoint_check. A solution to this would be to introduced a 
> low-rank (rank == access) lock, say ThreadsList_lock, and always take 
> it with no_safepoint_check.

The problem with switching locks is that we are protecting the scanning
phase of the SMR algorithm with the Threads_lock so we would have to
switch that lock also. I believe we use the Threads_lock with the
scanning because we're using closures... Of course, Erik or Robbin
should probably jump in here... :-)

Update: Erik is going to explore a lock-free solution for the
nesting algorithm.

> That solution would also solve a potential lock-rank problem, we have 
> that ThreadsListHandle can't be used if a higher-rank lock is held.

So far we haven't run into that problem and we think that the
check_for_valid_safepoint_state() will catch that if the code
should evolve to introduce one.


Update: Erik is going to explore a lock-free solution for the
nesting algorithm.


> 2) The following one-liner:
> - for (JavaThread* t = Threads::first(); t; t = t->next()) {
> has been converted to a five-liner:
>
> + {
> + ThreadsListHandle tlh;
> + JavaThreadIterator jti(tlh.list());
> + for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
> ... + } I find that unfortunate, and would prefer if this was 
> rewritten. For example: for (JavaThreadIterator jti; JavaThread* t = 
> jit.next();) { jti will implicitly hold the ThreadListHandle on the 
> stack. I know that the implicit conversion of pointer to bool is 
> non-conventional in the HotSpot code, but in this case I prefer that 
> over five lines of extra noise in the GC code.

Coleen made a similar comment in her OpenJDK review:

> Hi,  From my initial look at this, I really like new interface for
> walking the threads list, except every instance but 2 uses of
> JavaThreadIterator has a preceding ThreadsListHandle declaration.
>
> +  ThreadsListHandle tlh;
> +  JavaThreadIterator jti(tlh.list());
>
>
> Could there be a wrapper that embeds the ThreadsListHandle, like:
>
> class JavaThreadsListIterator {
>     ThreadsListHandle _tlh;
> ...
> } 


Yup a definite code noise problem. We originally didn't have an
iterator and used direct thread_at(foo) calls so we had a pretty
close correspondence between the old for-loop and the new for-loop.
Early reviewers asked for an iterator abstraction so we modeled
JavaThreadIterator after other existing iterators in the JVM/TI
area... (Yes, Dan stayed pretty close to "home" here...)

There are places where we still need the existing JavaThreadIterator
because a ThreadsList is passed down a call stack... So we've added
a JavaThreadIteratorWithHandle class.

Here's an example of the noisy code in the GC area:

src/hotspot/share/gc/g1/dirtyCardQueue.cpp:

@@ -319,8 +320,12 @@
    clear();
    // Since abandon is done only at safepoints, we can safely manipulate
    // these queues.
-  for (JavaThread* t = Threads::first(); t; t = t->next()) {
-    t->dirty_card_queue().reset();
+  {
+    ThreadsListHandle tlh;
+    JavaThreadIterator jti(tlh.list());
+    for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
+      t->dirty_card_queue().reset();
+    }
    }
    shared_dirty_card_queue()->reset();
  }

Here's an example of the revised code in the GC area:

src/hotspot/share/gc/g1/dirtyCardQueue.cpp:

@@ -319,7 +320,7 @@
    clear();
    // Since abandon is done only at safepoints, we can safely manipulate
    // these queues.
-  for (JavaThread* t = Threads::first(); t; t = t->next()) {
+  for (JavaThreadIteratorWithHandle jtiwh; JavaThread *t = 
jtiwh.next(); ) {
      t->dirty_card_queue().reset();
    }
    shared_dirty_card_queue()->reset();

This is definitely much less noisy and I'm hoping I don't catch too
much grief for the implied boolean usage...


> 3) cv_internal_thread_to_JavaThread Has one return value and two 
> output parameter. It forces the callers to setup stack lock variables 
> for the two new variables. --- Used to be ---
> - oop java_thread = JNIHandles::resolve_non_null(jthread); 
> java_lang_Thread::set_priority(java_thread, (ThreadPriority)prio); - 
> JavaThread* thr = java_lang_Thread::thread(java_thread); - if (thr != 
> NULL) { // Thread not yet started; priority pushed down when it is - 
> Thread::set_priority(thr, (ThreadPriority)prio);
> --- Now is ---
> + oop java_thread = NULL; + JavaThread* receiver = NULL; + bool 
> is_alive = tlh.cv_internal_thread_to_JavaThread(jthread, &receiver, 
> &java_thread); java_lang_Thread::set_priority(java_thread, 
> (ThreadPriority)prio); + + if (is_alive) { + // jthread refers to a 
> live JavaThread. + Thread::set_priority(receiver, 
> (ThreadPriority)prio); --- Maybe could be ---
> oop java_thread = JNIHandles::resolve_non_null(jthread); 
> java_lang_Thread::set_priority(java_thread, (ThreadPriority)prio); 
> JavaThread* thr = tlh.cv_internal_thread_to_JavaThread(java_thread); 
> if (thr != NULL) { // Thread not yet started; priority pushed down 
> when it is Thread::set_priority(thr, (ThreadPriority)prio); I don't 
> see the need to for the extra complexity of the two output parameters. 
> The return value is always true when thr is non-null, and always false 
> when thr is null.

There are three cv_*_to_JavaThread() functions and we had to craft them
very, very carefully to avoid changing some of the subtle semantics at
the original call sites. If you carefully examine the output parameters
and return value usages at all of the call sites, you'll see that each
was added to fit some existing semantic that we didn't want to risk
changing. Of course, not all of the call sites need all of the special
behaviors individually, but they do as a union.

In short, compatibility and risk management led us here.

> But the usage of cv_internal_thread_to_JavaThread is contained within 
> the Runtime code, so my opinion should probably not weigh as much as 
> other's opinions. :)

We definitely agree this is a mess, but not one we're willing to risk
changing at this time. Dan has made the observation that the JVM_*
entry points, like Y2K code, shows a variety of different coding
patterns, each with different (potential) issues...

Thanks for being flexible on accepting cv_internal_thread_to_JavaThread()
as is for now!


> 4) I'd prefer if abbreviations where expanded, so that the casual 
> reader would immediately understood the code. For example: t_list -> 
> thread_list cv_internal_thread_to_JavaThread -> 
> internal_thread_to_JavaThread

We've had requests to shorten names, spell out names, use different
names, etc. It seems that no one is going to be happy with all of
the names we used in this code.

Our guidelines:

- use the same consistently, e.g., t_list for ThreadsLists
- use a name that doesn't show up in an existing grep (if possible)

We hope you don't mind, but we're not planning to make any more name
changes...


> 5) This macro and the jesting is pretty bad.

Yup!


> I complained about it to Erik, and then he confessed that he wrote it :D
> +// Possibly the ugliest for loop the world has seen. C++ does not 
> allow +// multiple types in the declaration section of the for loop. 
> In this case +// we are only dealing with pointers and hence can cast 
> them. It looks ugly +// but macros are ugly and therefore it's fine to 
> make things absurdly ugly. +#define DO_JAVA_THREADS(LIST, X) \ + for 
> (JavaThread *MACRO_scan_interval = 
> (JavaThread*)(uintptr_t)PrefetchScanIntervalInBytes, \ + *MACRO_list = 
> (JavaThread*)(LIST), \ + **MACRO_end = 
> ((JavaThread**)((ThreadsList*)MACRO_list)->threads()) + 
> ((ThreadsList*)MACRO_list)->length(), \ + **MACRO_current_p = 
> (JavaThread**)((ThreadsList*)MACRO_list)->threads(), \ + *X = 
> (JavaThread*)prefetch_and_load_ptr((void**)MACRO_current_p, 
> (intx)MACRO_scan_interval); \ + MACRO_current_p != MACRO_end; \ + 
> MACRO_current_p++, \ + X = 
> (JavaThread*)prefetch_and_load_ptr((void**)MACRO_current_p, 
> (intx)MACRO_scan_interval)) + This can be rewritten without all these 
> cast, and minimal usage of the macros expansions: struct 
> JavaThreadPrefetchedIterator { ThreadList* _list; JavaThread** _end; 
> JavaThread** _current; JavaThreadIteror(ThreadList* list) : 
> _list(list), _end(list->threads() + list->length()), 
> _current(list->threads()) {} JavaThread* current() { return 
> context._current != context._end  ? 
> prefetch_and_load_ptr(context._current, PrefetchScanIntervalInBytes) 
>  : NULL) // ^ prefetch would be rewritten to return JavaThread* and 
> not void* } void next() { _current++; }; #define DO_JAVA_THREADS(LIST, 
> X) \ for (JavaThreadPrefetchedIterator iter(LIST); JavaThread* X = 
> iter.current(); iter.next()) (I did some changes to the code above and 
> haven't compiled this exact version)

Erik hasn't chimed in on this comment so I (Dan) am not planning to
try to resolve this comment in this round.

Update: Erik is planning to look at cleaning up the ugly macro...


> 6) I think it would be nice if the SMR stuff in thread.hpp were 
> encapsulated into an class instead of added directly to Thread and 
> Threads. I sort-of expected the SMR variables to be moved to 
> threadSMR.hpp. For example:
> class Threads: AllStatic { friend class VMStructs; private: + // Safe 
> Memory Reclamation (SMR) support: + static Monitor* _smr_delete_lock; 
> + // The '_cnt', '_max' and '_times" fields are enabled via + // 
> -XX:+EnableThreadSMRStatistics: + static uint 
> _smr_delete_lock_wait_cnt; + static uint _smr_delete_lock_wait_max; + 
> static volatile int _smr_delete_notify; + static volatile jint 
> _smr_deleted_thread_cnt; + static volatile jint 
> _smr_deleted_thread_time_max; + static volatile jint 
> _smr_deleted_thread_times; + static ThreadsList* volatile 
> _smr_java_thread_list; + static ThreadsList* 
> get_smr_java_thread_list() { + return 
> (ThreadsList*)OrderAccess::load_ptr_acquire((void* 
> volatile*)&_smr_java_thread_list); + } + static ThreadsList* 
> xchg_smr_java_thread_list(ThreadsList* new_list) { + return 
> (ThreadsList*)Atomic::xchg_ptr((void*)new_list, (volatile 
> void*)&_smr_java_thread_list); + } + static long 
> _smr_java_thread_list_alloc_cnt; + static long 
> _smr_java_thread_list_free_cnt; + static uint 
> _smr_java_thread_list_max; + static uint _smr_nested_thread_list_max; 
> + static volatile jint _smr_tlh_cnt; + static volatile jint 
> _smr_tlh_time_max; + static volatile jint _smr_tlh_times; + static 
> ThreadsList* _smr_to_delete_list; + static uint 
> _smr_to_delete_list_cnt; + static uint _smr_to_delete_list_max; Could 
> be: class Threads: AllStatic { friend class VMStructs; private: // 
> Safe Memory Reclamation (SMR) support: SMRSupport _smr_support; And 
> SMRSupport could be moved to threadSMR.hpp. I haven't read all the 
> code in detail, so I'm not sure if this is feasible or not, but my 
> gut-feeling says that it would be worth testing. The above is just an 
> example, and the rest of the code could probably be encapsulated and 
> moved as well.

We already moved all of the SMR stuff that was "easy" to move based
on your earlier comment. (Of course, doing that move introduced a
number of compilation complications, but that's just Dan complaining :-))

We don't really want to try this migration at this time since we're
still hoping to get this code into 18.3. (Also Dan doesn't see the
value in moving the SMR fields... Threads is already an eclectic static
class so why does SMR have to encapsulate when no other project did so?)


> 7) Currently, threadSMR.hpp "only" contains the ThreadList. Unless, 
> you move the SMR stuff into threadSMR.hpp, maybe it should be renamed 
> to threadsList.hpp? Maybe it does make sense to have both 
> threadSMR.hpp and threadsList.hpp?

We're planning to stick with the threadSMR.hpp and threadSMR.cpp
file names. Currently they contain the more standalone pieces of
thread SMR (more than just ThreadsList) so we're planning to keep
those names.

Thanks for the detailed review!!

Dan, Erik and Robbin


> Cheers and thanks! StefanK