RFR (M): 8245721: Refactor the TaskTerminator

Fri Jul 24 10:41:09 UTC 2020

Hi Kim and Leo,

   thanks for your reviews.

On 08.07.20 17:31, Kim Barrett wrote:
>> On Jun 24, 2020, at 4:03 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8245721
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8245721/webrev/
>> Testing:
>> tier1-5, many many perf rounds, many tier1-X rounds with other patches
>>
>> Thanks,
>>   Thomas
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.hpp
>    54   struct SpinContext {
>    55     uint yield_count;
> ...
> 
> While there's certainly variability on this, discussions I've had with
> others and other reviews suggest the Style Guide's suggestion to use
> leading underscores for data member names should apply even for public
> members of structs.
> 

Fixed.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.hpp
>   128       spin_context.hard_spin_limit = MIN2(2 * spin_context.hard_spin_limit,
>   129                                           (uint) WorkStealingHardSpins);
> 
> In the new code, hard_spin_limit is always (re)initialized to
> WorkStealingHardSpins, so this does nothing.
> 
> In the old code, the starting spin limit was
>    WorkStealingHardSpins >> WorkStealingSpinToYieldRatio
> and that value was captured in a variable for (hard_spin_start) for
> use in reinitializing the limit after each yield iteration.
> 
> So this is a change, and I'm guessing not an intentional one.

Fixed.

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
>   154   SpinContext spin_context;
> 
> If this were moved inside the outer loop of offer_termination, there
> would be no need for SpinContext::reset; just use the ctor-initializer
> for initialization of the members.
> 

Fixed.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
> 
> I *think* the inner loop of offer_termination would be made clearer by
> hoisting the yield_count check out of do_spin_iteration.  I'm thinking
> something like
> 
> instead of
> 
>   169       bool giveup_spin;
>   170       do {
>   171         size_t tasks;
>   172         bool should_exit_termination;
>   173         {
>   174           MutexUnlocker y(_blocker, Mutex::_no_safepoint_check_flag);
>   175           giveup_spin = do_spin_iteration(spin_context);
>   176
>   177           // Dirty read of exit condition.
>   178           tasks = tasks_in_queue_set();
>   179           should_exit_termination = exit_termination(tasks, terminator);
>   180         }
>   181
>   182         // Immediately check exit conditions after re-acquiring the lock using the
>   183         // information gathered just recently.
>   184         if (_offered_termination == _n_threads) {
>   185           prepare_for_return(the_thread);
>   186           assert_queue_set_empty();
>   187           return true;
>   188         } else if (should_exit_termination) {
>   189           prepare_for_return(the_thread, tasks);
>   190           _offered_termination--;
>   191           return false;
>   192         }
>   193       } while (!giveup_spin);
> 
> something like
> 
>    while (spin_context.yield_count <= WorkStealingYieldsBeforeSleep) {
>      // Each spin iteration is counted as a yield for purposes of
>      // deciding when to sleep.
>      ++spin_context.yield_count;
>      size_t tasks;
>      bool should_exit_termination;
>      {
>        MutexUnlocker y(_blocker ...);
>        do_spin_iteration(spin_context);
>        // Dirty read of exit condition.
>        tasks = tasks_in_queue_set();
>        should_exit_termination = exit_termination(tasks, terminator);
>      }
>      // Immediately check exit conditions after re-acquiring the lock.
>      if (_offered_termination == _n_threads) {
>        ... ;
>        return true;
>      } else if (should_exit_termination) {
>        ... ;
>      }
>    }
> 
> This also makes some issues with naming a little more apparent.
> do_spin_iteration might spin or might yield, yield_count is counting
> both spins and yields, and spin_context is about more than spinning.
> Maybe something a little more generic, using "delay" rather than
> "spin", like do_delay_step, delay_count, and delay_context?
> 
> [pre-existing]
> Side question: Should the yield_count test be <= or < ? It seems like
> WorkStealingYieldsBeforeSleep == 0 should disable yields and just go
> straight to the sleep case. And that's desirable for a uniprocessor,
> though why we'd be using a GC using this on a uniprocessor system is
> a question.  Exclusive test would also better match the doc string for
> the option.

Fixed.

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
>   177           // Dirty read of exit condition.
>   178           tasks = tasks_in_queue_set();
> 
>  From comment in the bug, that's intentionally done while not holding
> the lock, because doing otherwise makes the locked section too long.
> There should be a comment about that so nobody is tempted to move it
> (as I did in an earlier version of the above suggested move of the
> yield_count checking).
> 

Fixed.

> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
> 
> How bad would it be to not bother collecting the number of tasks and
> just have prepare_for_return unconditionally notify_all?

Fwiw, (really) unconditionally doing notify_all gives very very poor 
results, maybe because when work is completely done, all threads are 
hammering on it.

> 
> Alternatively, if that's bad and the cost of tasks_in_queue_set() is
> also a problem (which I can well believe), maybe the taskqueue
> facility can have some specialized helper function added for use here.
> All we really want to know in the current code is how many tasks are
> available up to the number of workers, or more than the number of
> workers and we don't need a precise answer.
>
> Such a change can be dealt with later.

I would like to defer that work (replacing with various degrees of using 
notify_all and improving gathering the number of tasks) for another change.

> Another idea would be to record the task count in the terminator,
> along with a last update epoch counter.  (Both are only touched under
> the lock, so no lock-free synchronization to worry about.)  Use the
> recorded count if it's epoch is more recent than the locally captured
> epoch from the most recent local use.
> 
> Of course, this is being called after some kind of delay (spin, yield,
> sleep), so it's not clear how important the performance is.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
>   204     } else {
>   205       size_t tasks = tasks_in_queue_set();
>   206       if (exit_termination(tasks, terminator)) {
>   207         prepare_for_return(the_thread, tasks);
>   208         _offered_termination--;
>   209         return false;
>   210       }
>   211     }
> 
> We get here only after our wait on the monitor ended.
> 
> If we were woken up by some other thread (so wait returned true), I
> think we should skip the exit_termination test (just assume true) and
> pass 0 to prepare_for_return. (So no need for task counting in this
> case.) So insert something like this before the above
> 
>    } else if (!timedout) {
>      // Don't bother waking up more tasks.
>      prepare_for_return(the_thread, 0);
>      _offered_termination--;
>      return false;
>    } else {
>      ...
> 
> I think this also fixes a (pre-existing, I think) bug. Assume there
> are enough tasks for exiting being called for, but fewer than the
> number of threads waiting. Either the spin master or the first timeout
> will wake up some, but not all, of the waiting threads. Each of those
> threads (or the spin master) may wake up additional threads, making
> the the limit on wakeups based on number of tasks pointless, and also
> making the task counting of much more limited value; given that
> exit_termination just checks tasks > 0, we could instead just check
> whether the queue_set is empty.  So I think without this we just have
> a slower version of always waking up all waiting threads.
> 
> (There is also a small improvement to TaskQueueSuper::is_empty() that
> I didn't bother making because it didn't seem to matter.)

Done.

> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
>   156   MutexLocker x(_blocker, Mutex::_no_safepoint_check_flag);
> ...
>   174           MutexUnlocker y(_blocker, Mutex::_no_safepoint_check_flag);
> ...
>   197     _blocker->wait_without_safepoint_check(WorkStealingSleepMillis);
> 
> I agree with Zhengyu's suggestion of using MonitorLocker and waiting
> on the locker, rather than using naked _blocker->wait_xxx.
> 

Fixed.

 From Leo:
On 09.07.20 14:38, Leo Korinth wrote:
 > I just have two small suggestions that have not been mentioned yet.
 >
 > - TaskTerminator::yield() should be used or removed.
 > - Maybe _blocker could be "inlined" in the parent object (not being a
 > pointer) and we would get rid of a delete as well as an assert and
 > also  do one allocation less.
 >

Fixed.

New webrev:
http://cr.openjdk.java.net/~tschatzl/8245721/webrev.1/ (full)
http://cr.openjdk.java.net/~tschatzl/8245721/webrev.0_to_1/ (diff)

Testing:
tier1-5, various benchmarks

Thomas