RFR (S) 8245812: Shenandoah: compute root phase parallelism

Aleksey Shipilev shade at redhat.com
Tue May 26 14:40:11 UTC 2020


RFE:
  https://bugs.openjdk.java.net/browse/JDK-8245812

Current gc+stats log says:

[gc,stats] All times are wall-clock times, except per-root-class counters, that are sum over
[gc,stats] all workers. Dividing the <total> over the root stage time estimates parallelism.

But we can actually compute it ourselves:

diff -r 6a562866cbd0 src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp        Tue May 26 09:44:17 2020 -0400
+++ b/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp        Tue May 26 16:36:10 2020 +0200
@@ -35,10 +35,11 @@

 #define SHENANDOAH_PHASE_NAME_FORMAT "%-28s"
 #define SHENANDOAH_S_TIME_FORMAT "%8.3lf"
 #define SHENANDOAH_US_TIME_FORMAT "%8.0lf"
 #define SHENANDOAH_US_WORKER_TIME_FORMAT "%3.0lf"
+#define SHENANDOAH_PARALLELISM_FORMAT "%4.2lf"

 #define SHENANDOAH_PHASE_DECLARE_NAME(type, title) \
   title,

 const char* ShenandoahPhaseTimings::_phase_names[] = {
@@ -227,10 +228,16 @@
   out->cr();
   for (uint i = 0; i < _num_phases; i++) {
     double v = _cycle_data[i] * 1000000.0;
     if (v > 0) {
       out->print(SHENANDOAH_PHASE_NAME_FORMAT " " SHENANDOAH_US_TIME_FORMAT " us", _phase_names[i], v);
+
+      if (is_worker_phase(Phase(i))) {
+        double total = _cycle_data[i + 1] * 1000000.0;
+        out->print(", parallelism: " SHENANDOAH_PARALLELISM_FORMAT "x", total / v);
+      }
+
       if (_worker_data[i] != NULL) {
         out->print(", workers (us): ");
         for (uint c = 0; c < _max_workers; c++) {
           double tv = _worker_data[i]->get(c);
           if (tv != ShenandoahWorkerData::uninitialized()) {

Would print like this, on a special CLDG torture test that is used to optimize CLDG walks:

 Pause Init Mark (G)               26558 us
 Pause Init Mark (N)               26357 us
   Make Parsable                       7 us
   Update Region States               31 us
   Scan Roots                      26267 us, parallelism: 3.63x
     S: <total>                    95442 us
     S: Thread Roots                 159 us, workers (us):  41,  15,  15,   5,  17, ...
     S: Universe Roots                 3 us, workers (us):   3, ---, ---, ---, ---, ...
     S: JNI Handles Roots              4 us, workers (us):   1,   0,   1,   1,   1, ...
     S: VM Global Roots                1 us, workers (us):   0,   0,   0,   0,   0, ...
     S: Synchronizer Roots             0 us, workers (us):   0, ---, ---, ---, ---, ...
     S: Management Roots               1 us, workers (us):   1, ---, ---, ---, ---, ...
     S: System Dict Roots             10 us, workers (us): ---,  10, ---, ---, ---, ...
     S: CLDG Roots                 95263 us, workers (us): 11933, 11924, 11926, 11923, ...
     S: JVMTI Roots                    0 us, workers (us):   0, ---, ---, ---, ---, ...

Note it says ~3.6x parallelism, while there are 8 workers. This highlights the high setup costs for
CLDG walk.

Testing: hotspot_gc_shenandoah, eyeballing gc logs

-- 
Thanks,
-Aleksey




More information about the hotspot-gc-dev mailing list