RFR (XL): 8202845: Refactor reference processing for improved parallelism
Thomas Schatzl
thomas.schatzl at oracle.com
Tue Jun 5 12:27:04 UTC 2018
Hi all,
can I have reviews for this change that improves the efficiency of
existing j.l.ref processing in all garbage collectors by squashing
together phases for the different types of References as much as
possible.
Kim outlined what can be done when in the CR in detail, so please look
there for the idea.
This improves efficiency in terms of time parallel Reference Processing
takes: instead of starting up reference processors 9 times, you
actually only need to do that 4 times. That can save a significant
amount of gc pause time too :)
One side effect of this is unfortunately another change in the log
output: as we do not have separate timings for every phase of every
Reference type any more, the idea was to turn the logging inside-out:
Instead of this structure (at trace level):
Reference Processing: 0.0ms
SoftReference: 0.5ms
Balance queues: 0.0ms
Phase1: 0.2ms
Process lists (ms) Min: 0.0, Avg: ... Workers: 28
Phase2: 0.1ms
Process lists (ms) Min: 0.0, Avg: ... Workers: 28
Phase3: 0.2ms
Process lists (ms) Min: 0.0, Avg: ... Workers: 28
Discovered: 0
Cleared: 0
WeakReference: 0.3ms
Balance queues: 0.0ms
Phase2: 0.1ms
Process lists (ms) Min: 0.0, Avg: ... Workers: 28
Phase3: 0.2ms
Process lists (ms) Min: 0.0, Avg: ... Workers: 28
Discovered: 40
Cleared: 28
FinalReference: 0.3ms
Balance queues: 0.0ms
Phase2: 0.1ms
Process lists (ms) Min: 0.0, Avg: ... Workers: 28
Phase3: 0.2ms
Process lists (ms) Min: 0.0, Avg: ... Workers: 28
Discovered: 0
Cleared: 0
PhantomReference: 0.3ms
Balance queues: 0.0ms
Phase2: 0.1ms
Process lists (ms) Min: 0.0, Avg: ... Workers: 28
Phase3: 0.2ms
Process lists (ms) Min: 0.0, Avg: ... Workers: 28
Discovered: 6
Cleared: 5
it will look like this:
Reference Processing: 0.1ms
Phase1: 0.0ms
Balance queues: 0.0ms
SoftRef (ms): Min: 0.0, Avg: ... Workers: 3
0.0 0.0 0.0 - ... - - - - - - (T)
Phase2: 0.0ms
Balance queues: 0.0ms
SoftRef (ms): Min: 0.0, Avg: ... Workers: 3
0.0 0.0 0.0 - ... - - - - - - (T)
WeakRef (ms): Min: 0.0, Avg: ... Workers: 3
0.0 0.0 0.0 - ... - - - - - - (T)
FinalRef (ms): Min: 0.0, Avg: ... Workers: 3
0.0 0.0 0.0 - ... - - - - - - (T)
Phase3: 0.0ms
Balance queues: 0.0ms
FinalRef (ms): Min: 0.0, Avg: ... Workers: 3
0.0 0.0 0.0 - ... - - - - - - (T)
Phase4: 0.0ms
Balance queues: 0.0ms
PhantomRef (ms): Min: 0.0, Avg: ... Workers: 3
0.0 0.0 0.0 - ... - - - - - - (T)
SoftReference:
Discovered: 0
Cleared: 0
WeakReference:
Discovered: 30
Cleared: 30
FinalReference:
Discovered: 0
Cleared: 0
PhantomReference:
Discovered: 5
Cleared: 5
I.e. instead of showing for every reference type their phases with
timing, now the four phases with the reference timings that are
processed within that reference are shown.
Also (that has been a bug), the lines with a "(T)" are added at trace
level.
At debug level the timing output has been updated to be similar to
other gc+phases output, i.e. is now
Reference Processing: 0.2ms
Phase1: 0.0ms
Balance queues: 0.0ms
SoftRef (ms): Min: 0.0, Avg: ... Workers: 3
Phase2: 0.1ms
Balance queues: 0.0ms
SoftRef (ms): Min: 0.0, Avg: ... Workers: 3
WeakRef (ms): Min: 0.0, Avg: ... Workers: 3
FinalRef (ms): Min: 0.0, Avg: ... Workers: 3
Phase3: 0.0ms
Balance queues: 0.0ms
FinalRef (ms): Min: 0.0, Avg: ... Workers: 3
Phase4: 0.0ms
Balance queues: 0.0ms
PhantomRef (ms): Min: 0.0, Avg: ... Workers: 3
SoftReference:
Discovered: 0
Cleared: 0
WeakReference:
Discovered: 38
Cleared: 30
FinalReference:
Discovered: 0
Cleared: 0
PhantomReference:
Discovered: 7
Cleared: 6
instead of the old output as follows:
Reference Processing: 0.0ms
SoftReference: 0.5ms
Balance queues: 0.0ms
Phase1: 0.2ms
Phase2: 0.1ms
Phase3: 0.2ms
Discovered: 0
Cleared: 0
WeakReference: 0.3ms
Balance queues: 0.0ms
Phase2: 0.1ms
Phase3: 0.2ms
Discovered: 0
Cleared: 0
FinalReference: 0.3ms
Balance queues: 0.0ms
Phase2: 0.1ms
Phase3: 0.2ms
Discovered: 0
Cleared: 0
PhantomReference: 0.3ms
Balance queues: 0.0ms
Phase2: 0.1ms
Phase3: 0.2ms
Discovered: 5
Cleared: 5
which did not show any per-thread information.
In case of using serial ref processing, this is the proposed output:
Reference Processing: 0.1ms
Phase1: 0.0ms
SoftRef: 0.0ms
Phase2: 0.0ms
SoftRef: 0.0ms
WeakRef: 0.0ms
FinalRef: 0.0ms
Phase3: 0.0ms
FinalRef: 0.0ms
Phase4: 0.0ms
PhantomRef: 0.0ms
SoftReference:
Discovered: 0
Cleared: 0
WeakReference:
Discovered: 30
Cleared: 30
FinalReference:
Discovered: 0
Cleared: 0
PhantomReference:
Discovered: 5
Cleared: 5
Note that in the previous version, the output of parallel/debug was the
same as serial/debug.
Discovered/Cleared counts per reference type are shown afterwards
instead of within in all cases. Note that I think we should improve and
probably streamline the output of reference counts, but that is imho a
different issue (and this one is big enough as is).
This change builds on "JDK-8203319: JDK-8201487 disabled too much queue
balancing".
There is also a review of "JDK-8043575: Dynamically parallelize
reference processing work" out for review. I also talked to Sangheon
that I will do the merging as this change messes up the same code (in
the referenceProcessor* files) so much.
CR:
https://bugs.openjdk.java.net/browse/JDK-8202845
Webrev:
http://cr.openjdk.java.net/~tschatzl/8202845/webrev/
Testing:
hs-tier1-5,jdk-tier1-3
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list