RFR (M): 8245721: Refactor the TaskTerminator
Thomas Schatzl
thomas.schatzl at oracle.com
Fri Jul 24 10:41:09 UTC 2020
Hi Kim and Leo,
thanks for your reviews.
On 08.07.20 17:31, Kim Barrett wrote:
>> On Jun 24, 2020, at 4:03 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8245721
>> Webrev:
>> http://cr.openjdk.java.net/~tschatzl/8245721/webrev/
>> Testing:
>> tier1-5, many many perf rounds, many tier1-X rounds with other patches
>>
>> Thanks,
>> Thomas
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.hpp
> 54 struct SpinContext {
> 55 uint yield_count;
> ...
>
> While there's certainly variability on this, discussions I've had with
> others and other reviews suggest the Style Guide's suggestion to use
> leading underscores for data member names should apply even for public
> members of structs.
>
Fixed.
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.hpp
> 128 spin_context.hard_spin_limit = MIN2(2 * spin_context.hard_spin_limit,
> 129 (uint) WorkStealingHardSpins);
>
> In the new code, hard_spin_limit is always (re)initialized to
> WorkStealingHardSpins, so this does nothing.
>
> In the old code, the starting spin limit was
> WorkStealingHardSpins >> WorkStealingSpinToYieldRatio
> and that value was captured in a variable for (hard_spin_start) for
> use in reinitializing the limit after each yield iteration.
>
> So this is a change, and I'm guessing not an intentional one.
Fixed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
> 154 SpinContext spin_context;
>
> If this were moved inside the outer loop of offer_termination, there
> would be no need for SpinContext::reset; just use the ctor-initializer
> for initialization of the members.
>
Fixed.
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
>
> I *think* the inner loop of offer_termination would be made clearer by
> hoisting the yield_count check out of do_spin_iteration. I'm thinking
> something like
>
> instead of
>
> 169 bool giveup_spin;
> 170 do {
> 171 size_t tasks;
> 172 bool should_exit_termination;
> 173 {
> 174 MutexUnlocker y(_blocker, Mutex::_no_safepoint_check_flag);
> 175 giveup_spin = do_spin_iteration(spin_context);
> 176
> 177 // Dirty read of exit condition.
> 178 tasks = tasks_in_queue_set();
> 179 should_exit_termination = exit_termination(tasks, terminator);
> 180 }
> 181
> 182 // Immediately check exit conditions after re-acquiring the lock using the
> 183 // information gathered just recently.
> 184 if (_offered_termination == _n_threads) {
> 185 prepare_for_return(the_thread);
> 186 assert_queue_set_empty();
> 187 return true;
> 188 } else if (should_exit_termination) {
> 189 prepare_for_return(the_thread, tasks);
> 190 _offered_termination--;
> 191 return false;
> 192 }
> 193 } while (!giveup_spin);
>
> something like
>
> while (spin_context.yield_count <= WorkStealingYieldsBeforeSleep) {
> // Each spin iteration is counted as a yield for purposes of
> // deciding when to sleep.
> ++spin_context.yield_count;
> size_t tasks;
> bool should_exit_termination;
> {
> MutexUnlocker y(_blocker ...);
> do_spin_iteration(spin_context);
> // Dirty read of exit condition.
> tasks = tasks_in_queue_set();
> should_exit_termination = exit_termination(tasks, terminator);
> }
> // Immediately check exit conditions after re-acquiring the lock.
> if (_offered_termination == _n_threads) {
> ... ;
> return true;
> } else if (should_exit_termination) {
> ... ;
> }
> }
>
> This also makes some issues with naming a little more apparent.
> do_spin_iteration might spin or might yield, yield_count is counting
> both spins and yields, and spin_context is about more than spinning.
> Maybe something a little more generic, using "delay" rather than
> "spin", like do_delay_step, delay_count, and delay_context?
>
> [pre-existing]
> Side question: Should the yield_count test be <= or < ? It seems like
> WorkStealingYieldsBeforeSleep == 0 should disable yields and just go
> straight to the sleep case. And that's desirable for a uniprocessor,
> though why we'd be using a GC using this on a uniprocessor system is
> a question. Exclusive test would also better match the doc string for
> the option.
Fixed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
> 177 // Dirty read of exit condition.
> 178 tasks = tasks_in_queue_set();
>
> From comment in the bug, that's intentionally done while not holding
> the lock, because doing otherwise makes the locked section too long.
> There should be a comment about that so nobody is tempted to move it
> (as I did in an earlier version of the above suggested move of the
> yield_count checking).
>
Fixed.
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
>
> How bad would it be to not bother collecting the number of tasks and
> just have prepare_for_return unconditionally notify_all?
Fwiw, (really) unconditionally doing notify_all gives very very poor
results, maybe because when work is completely done, all threads are
hammering on it.
>
> Alternatively, if that's bad and the cost of tasks_in_queue_set() is
> also a problem (which I can well believe), maybe the taskqueue
> facility can have some specialized helper function added for use here.
> All we really want to know in the current code is how many tasks are
> available up to the number of workers, or more than the number of
> workers and we don't need a precise answer.
>
> Such a change can be dealt with later.
I would like to defer that work (replacing with various degrees of using
notify_all and improving gathering the number of tasks) for another change.
> Another idea would be to record the task count in the terminator,
> along with a last update epoch counter. (Both are only touched under
> the lock, so no lock-free synchronization to worry about.) Use the
> recorded count if it's epoch is more recent than the locally captured
> epoch from the most recent local use.
>
> Of course, this is being called after some kind of delay (spin, yield,
> sleep), so it's not clear how important the performance is.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
> 204 } else {
> 205 size_t tasks = tasks_in_queue_set();
> 206 if (exit_termination(tasks, terminator)) {
> 207 prepare_for_return(the_thread, tasks);
> 208 _offered_termination--;
> 209 return false;
> 210 }
> 211 }
>
> We get here only after our wait on the monitor ended.
>
> If we were woken up by some other thread (so wait returned true), I
> think we should skip the exit_termination test (just assume true) and
> pass 0 to prepare_for_return. (So no need for task counting in this
> case.) So insert something like this before the above
>
> } else if (!timedout) {
> // Don't bother waking up more tasks.
> prepare_for_return(the_thread, 0);
> _offered_termination--;
> return false;
> } else {
> ...
>
> I think this also fixes a (pre-existing, I think) bug. Assume there
> are enough tasks for exiting being called for, but fewer than the
> number of threads waiting. Either the spin master or the first timeout
> will wake up some, but not all, of the waiting threads. Each of those
> threads (or the spin master) may wake up additional threads, making
> the the limit on wakeups based on number of tasks pointless, and also
> making the task counting of much more limited value; given that
> exit_termination just checks tasks > 0, we could instead just check
> whether the queue_set is empty. So I think without this we just have
> a slower version of always waking up all waiting threads.
>
> (There is also a small improvement to TaskQueueSuper::is_empty() that
> I didn't bother making because it didn't seem to matter.)
Done.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskTerminator.cpp
> 156 MutexLocker x(_blocker, Mutex::_no_safepoint_check_flag);
> ...
> 174 MutexUnlocker y(_blocker, Mutex::_no_safepoint_check_flag);
> ...
> 197 _blocker->wait_without_safepoint_check(WorkStealingSleepMillis);
>
> I agree with Zhengyu's suggestion of using MonitorLocker and waiting
> on the locker, rather than using naked _blocker->wait_xxx.
>
Fixed.
From Leo:
On 09.07.20 14:38, Leo Korinth wrote:
> I just have two small suggestions that have not been mentioned yet.
>
> - TaskTerminator::yield() should be used or removed.
> - Maybe _blocker could be "inlined" in the parent object (not being a
> pointer) and we would get rid of a delete as well as an assert and
> also do one allocation less.
>
Fixed.
New webrev:
http://cr.openjdk.java.net/~tschatzl/8245721/webrev.1/ (full)
http://cr.openjdk.java.net/~tschatzl/8245721/webrev.0_to_1/ (diff)
Testing:
tier1-5, various benchmarks
Thomas
More information about the hotspot-gc-dev
mailing list