RFR: 8260497: Shenandoah: Improve SATB flushing

Wed Jan 27 15:47:56 UTC 2021

On Wed, 27 Jan 2021 11:05:51 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

>> Currently, we periodically force flushing of SATB queues. This works by activating a flag every 100ms in every thread, which causes that thread to enqueue its SATB buffer the next time it overflows, even if it doesn't meet its threshold after filtering. This is somewhat problematic when a thread does not actually overflow its SATB queue in time. The whole point of the exercise is to try and avoid having too much left-over work when we reach final-mark.
>> 
>> We can do better than that: when concurrent mark is done we can handshake all threads, and let them flush their respective SATB queues, and re-enter concurrent mark loop again, until flushing yields no more work. Experiments show that it usually takes 1-3 flushes to clean out leftover work properly.
>> 
>> I ran benchmarks, 3 high-pressure preset runs of SPECjbb2015, 10 minutes each:
>> 
>> baseline:
>> Finish Mark                  =    0,251 s (a =      688 us) (n =   364) (lvls, us =      125,      486,      621,      824,     4156)
>> Finish Mark                  =    0,338 s (a =      922 us) (n =   366) (lvls, us =      131,      494,      652,      852,    72948)
>> Finish Mark                  =    0,257 s (a =      699 us) (n =   368) (lvls, us =      111,      492,      645,      826,     4447)
>> 
>> patched:
>> Finish Mark                  =    0,112 s (a =      301 us) (n =   370) (lvls, us =      115,      207,      250,      281,     3709)
>> Finish Mark                  =    0,107 s (a =      292 us) (n =   368) (lvls, us =      107,      209,      248,      287,     3329)
>> Finish Mark                  =    0,114 s (a =      310 us) (n =   367) (lvls, us =      115,      211,      254,      285,     3819)
>> 
>> It reliably lowers all timings for finish-mark. It also doesn't cause any regressions in throughput.
>> 
>> Testing:
>>  - [x] hotspot_gc_shenandoah
>>  - [x] benchmarks
>
> src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 270:
> 
>> 268:     enqueued_count_after = qset.completed_buffers_num();
>> 269:     flushes++;
>> 270:   } while (enqueued_count_before != enqueued_count_after && flushes < max_flushes);
> 
> So, how does this interact with cancellation? Shouldn't we check for `cancelled_gc()` here as well?

I think this would be cleaner:

  ShenandoahFlushSATBHandshakeClosure flush_satb(qset);

  for (int flushes = 0; flushes < ShenandoahMaxSATBBufferFlushes; flushes++) {
    TaskTerminator terminator(nworkers, task_queues());
    ShenandoahConcurrentMarkingTask task(this, &terminator);
    workers->run_task(&task);

    if (cancelled_gc()) {
      // GC is cancelled, break out.
      break;
    }

    int before = qset.completed_buffers_num();
    Handshake::execute(&flush_satb);
    int after = qset.completed_buffers_num();

    if (before == after) {
       // No more retries needed, break out.
       break;
     }
  }

-------------

PR: https://git.openjdk.java.net/jdk/pull/2254