RFR (S) 8245812: Shenandoah: compute root phase parallelism
Aleksey Shipilev
shade at redhat.com
Tue May 26 14:40:11 UTC 2020
RFE:
https://bugs.openjdk.java.net/browse/JDK-8245812
Current gc+stats log says:
[gc,stats] All times are wall-clock times, except per-root-class counters, that are sum over
[gc,stats] all workers. Dividing the <total> over the root stage time estimates parallelism.
But we can actually compute it ourselves:
diff -r 6a562866cbd0 src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp
--- a/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp Tue May 26 09:44:17 2020 -0400
+++ b/src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp Tue May 26 16:36:10 2020 +0200
@@ -35,10 +35,11 @@
#define SHENANDOAH_PHASE_NAME_FORMAT "%-28s"
#define SHENANDOAH_S_TIME_FORMAT "%8.3lf"
#define SHENANDOAH_US_TIME_FORMAT "%8.0lf"
#define SHENANDOAH_US_WORKER_TIME_FORMAT "%3.0lf"
+#define SHENANDOAH_PARALLELISM_FORMAT "%4.2lf"
#define SHENANDOAH_PHASE_DECLARE_NAME(type, title) \
title,
const char* ShenandoahPhaseTimings::_phase_names[] = {
@@ -227,10 +228,16 @@
out->cr();
for (uint i = 0; i < _num_phases; i++) {
double v = _cycle_data[i] * 1000000.0;
if (v > 0) {
out->print(SHENANDOAH_PHASE_NAME_FORMAT " " SHENANDOAH_US_TIME_FORMAT " us", _phase_names[i], v);
+
+ if (is_worker_phase(Phase(i))) {
+ double total = _cycle_data[i + 1] * 1000000.0;
+ out->print(", parallelism: " SHENANDOAH_PARALLELISM_FORMAT "x", total / v);
+ }
+
if (_worker_data[i] != NULL) {
out->print(", workers (us): ");
for (uint c = 0; c < _max_workers; c++) {
double tv = _worker_data[i]->get(c);
if (tv != ShenandoahWorkerData::uninitialized()) {
Would print like this, on a special CLDG torture test that is used to optimize CLDG walks:
Pause Init Mark (G) 26558 us
Pause Init Mark (N) 26357 us
Make Parsable 7 us
Update Region States 31 us
Scan Roots 26267 us, parallelism: 3.63x
S: <total> 95442 us
S: Thread Roots 159 us, workers (us): 41, 15, 15, 5, 17, ...
S: Universe Roots 3 us, workers (us): 3, ---, ---, ---, ---, ...
S: JNI Handles Roots 4 us, workers (us): 1, 0, 1, 1, 1, ...
S: VM Global Roots 1 us, workers (us): 0, 0, 0, 0, 0, ...
S: Synchronizer Roots 0 us, workers (us): 0, ---, ---, ---, ---, ...
S: Management Roots 1 us, workers (us): 1, ---, ---, ---, ---, ...
S: System Dict Roots 10 us, workers (us): ---, 10, ---, ---, ---, ...
S: CLDG Roots 95263 us, workers (us): 11933, 11924, 11926, 11923, ...
S: JVMTI Roots 0 us, workers (us): 0, ---, ---, ---, ---, ...
Note it says ~3.6x parallelism, while there are 8 workers. This highlights the high setup costs for
CLDG walk.
Testing: hotspot_gc_shenandoah, eyeballing gc logs
--
Thanks,
-Aleksey
More information about the shenandoah-dev
mailing list