RFR: 8260497: Shenandoah: Improve SATB flushing

Wed Jan 27 15:47:53 UTC 2021

On Wed, 27 Jan 2021 10:15:19 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> Currently, we periodically force flushing of SATB queues. This works by activating a flag every 100ms in every thread, which causes that thread to enqueue its SATB buffer the next time it overflows, even if it doesn't meet its threshold after filtering. This is somewhat problematic when a thread does not actually overflow its SATB queue in time. The whole point of the exercise is to try and avoid having too much left-over work when we reach final-mark.
> 
> We can do better than that: when concurrent mark is done we can handshake all threads, and let them flush their respective SATB queues, and re-enter concurrent mark loop again, until flushing yields no more work. Experiments show that it usually takes 1-3 flushes to clean out leftover work properly.
> 
> I ran benchmarks, 3 high-pressure preset runs of SPECjbb2015, 10 minutes each:
> 
> baseline:
> Finish Mark                  =    0,251 s (a =      688 us) (n =   364) (lvls, us =      125,      486,      621,      824,     4156)
> Finish Mark                  =    0,338 s (a =      922 us) (n =   366) (lvls, us =      131,      494,      652,      852,    72948)
> Finish Mark                  =    0,257 s (a =      699 us) (n =   368) (lvls, us =      111,      492,      645,      826,     4447)
> 
> patched:
> Finish Mark                  =    0,112 s (a =      301 us) (n =   370) (lvls, us =      115,      207,      250,      281,     3709)
> Finish Mark                  =    0,107 s (a =      292 us) (n =   368) (lvls, us =      107,      209,      248,      287,     3329)
> Finish Mark                  =    0,114 s (a =      310 us) (n =   367) (lvls, us =      115,      211,      254,      285,     3819)
> 
> It reliably lowers all timings for finish-mark. It also doesn't cause any regressions in throughput.
> 
> Testing:
>  - [x] hotspot_gc_shenandoah
>  - [x] benchmarks

Changes requested by shade (Reviewer).

This looks nice. Although you want to fix some of the build falures: https://github.com/rkennke/jdk/runs/1776231024?check_suite_focus=true

Changes requested by shade (Reviewer).

This looks good to me!

src/hotspot/share/gc/shared/satbMarkQueue.hpp line 118:

> 116:   // Return true if the queue's buffer should be enqueued, even if not full.
> 117:   // The default method uses the buffer enqueue threshold.
> 118:   bool should_enqueue_buffer(SATBMarkQueue& queue);

Why drop `virtual` here? Is it because Shenandoah was the only virtual override of it, and now we can do the non-virtual call?

src/hotspot/share/gc/shenandoah/shenandoahSATBMarkQueueSet.cpp line 59:

> 57: void ShenandoahSATBMarkQueueSet::enqueue_completed_buffer(BufferNode* node) {
> 58:   SATBMarkQueueSet::enqueue_completed_buffer(node);
> 59:   Atomic::inc(&_enqueued_count);

I believe `SATBMarkQueueSet` already tracks this, and we could instead use `SATBMarkQueueSet::completed_buffers_num`?

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 256:

> 254: 
> 255:   ShenandoahFlushSATBHandshakeClosure flush_satb;
> 256:   ShenandoahSATBMarkQueueSet& qset = ShenandoahBarrierSet::satb_mark_queue_set();

Since you have the `qset` here, you might as well pass it to closure.

src/hotspot/share/gc/shenandoah/shenandoahSATBMarkQueueSet.hpp line 35:

> 33: class ShenandoahSATBMarkQueueSet : public SATBMarkQueueSet {
> 34: private:
> 35:   volatile int _enqueued_count;

I have a suspicion that `int` would overflow at some point in the long-running application. `size_t` would fit better, but then see the other comment that `SATBMQ` already tracks it itself.

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 266:

> 264:     workers->run_task(&task);
> 265: 
> 266:     enqueued_count_before = qset.completed_buffers_num();

Suggestion for names: `completed_before`, `completed_after`.

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 270:

> 268:     enqueued_count_after = qset.completed_buffers_num();
> 269:     flushes++;
> 270:   } while (enqueued_count_before != enqueued_count_after && flushes < max_flushes);

So, how does this interact with cancellation? Shouldn't we check for `cancelled_gc()` here as well?

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 257:

> 255:   ShenandoahSATBMarkQueueSet& qset = ShenandoahBarrierSet::satb_mark_queue_set();
> 256:   ShenandoahFlushSATBHandshakeClosure flush_satb(qset);
> 257:   for (int flushes = 0; flushes < ShenandoahMaxSATBBufferFlushes; flushes++) {

Should probably be `uint` to match the unsigned `ShenandoahMaxSATBBufferFlushes`.

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 267:

> 265:     }
> 266: 
> 267:     int before = qset.completed_buffers_num();

Should probably be `size_t`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2254Marked as reviewed by shade (Reviewer).