RFR: 8325553: Parallel: Use per-marker cache for marking stats during Full GC

Wed Feb 14 12:16:02 UTC 2024

On Wed, 14 Feb 2024 10:09:30 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Add cache to avoid too much contentious writes to global marking stats.
>> 
>> Test: tier1-6; using the attached bm, ~30% reduction in pause-time is observed. Only marginal improvement (probably just noise) is observed in BigRamTester.
>
> src/hotspot/share/gc/parallel/psParallelCompact.cpp line 2059:
> 
>> 2057:       }
>> 2058:     } task;
>> 2059:     ParallelScavengeHeap::heap()->workers().run_task(&task);
> 
> Is it really worth potentially spinning up (100's of) worker threads for this very short operation? Either way, I would prefer hiding this in a helper method.
> 
> For testing, I added some code that shows the costs of spinning up threads for such a negligible operation:
> 
> 2 threads (parallel version takes almost twice the time):
> 
> 
> Parallel version:
> $ java -Xmx3g -Xms3g -XX:-ScavengeBeforeFullGC -XX:MarkSweepDeadRatio=0 -XX:ParallelGCThreads=2 -XX:+UseParallelGC -XX:NewSize=1g "-Xlog:gc,gc+phases=debug" -XX:+UnlockDiagnosticVMOptions -XX:-UseNewCode fullgc | grep lush
> [0,200s][debug][gc,phases] GC(0) Flush Marking Stats 0,032ms
> [0,534s][debug][gc,phases] GC(1) Flush Marking Stats 0,022ms
> [0,819s][debug][gc,phases] GC(2) Flush Marking Stats 0,020ms
> [1,159s][debug][gc,phases] GC(3) Flush Marking Stats 0,021ms
> [1,461s][debug][gc,phases] GC(4) Flush Marking Stats 0,019ms
> [1,784s][debug][gc,phases] GC(5) Flush Marking Stats 0,022ms
> 
> Serial version:
> $ java -Xmx3g -Xms3g -XX:-ScavengeBeforeFullGC -XX:MarkSweepDeadRatio=0 -XX:ParallelGCThreads=2 -XX:+UseParallelGC -XX:NewSize=1g "-Xlog:gc,gc+phases=debug" -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode fullgc | grep lush
> [0,247s][debug][gc,phases] GC(0) Flush Marking Stats 0,012ms
> [0,597s][debug][gc,phases] GC(1) Flush Marking Stats 0,014ms
> [0,881s][debug][gc,phases] GC(2) Flush Marking Stats 0,034ms
> [1,208s][debug][gc,phases] GC(3) Flush Marking Stats 0,016ms
> [1,513s][debug][gc,phases] GC(4) Flush Marking Stats 0,017ms
> [1,843s][debug][gc,phases] GC(5) Flush Marking Stats 0,017ms
> 
> 8 threads, showing speedup of ~2:
> 
> Parallel version:
> 
> $ java -Xmx3g -Xms3g -XX:-ScavengeBeforeFullGC -XX:MarkSweepDeadRatio=0 -XX:ParallelGCThreads=8 -XX:+UseParallelGC -XX:NewSize=1g "-Xlog:gc,gc+phases=debug" -XX:+UnlockDiagnosticVMOptions -XX:-UseNewCode fullgc | grep lush
> [0,135s][debug][gc,phases] GC(0) Flush Marking Stats 0,028ms
> [0,266s][debug][gc,phases] GC(1) Flush Marking Stats 0,029ms
> [0,378s][debug][gc,phases] GC(2) Flush Marking Stats 0,037ms
> [0,509s][debug][gc,phases] GC(3) Flush Marking Stats 0,030ms
> [0,628s][debug][gc,phases] GC(4) Flush Marking Stats 0,027ms
> [0,747s][debug][gc,phases] GC(5) Flush Marking Stats 0,027ms
> 
> Serial version:
> 
> $ java -Xmx3g -Xms3g -XX:-ScavengeBeforeFullGC -XX:MarkSweepDeadRatio=0 -XX:ParallelGCThreads=8 -XX:+UseParallelGC -XX:NewSize...

An alternative to explicit flushing in a phase could be putting this per-work thread call at the end of the marking in the main marking phase and the reference processing marking.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17788#discussion_r1489375724