RFR (XL): 8202845: Refactor reference processing for improved parallelism

Tue Jun 5 12:27:04 UTC 2018

Hi all,

  can I have reviews for this change that improves the efficiency of
existing j.l.ref processing in all garbage collectors by squashing
together phases for the different types of References as much as
possible.

Kim outlined what can be done when in the CR in detail, so please look
there for the idea.

This improves efficiency in terms of time parallel Reference Processing
takes: instead of starting up reference processors 9 times, you
actually only need to do that 4 times. That can save a significant
amount of gc pause time too :)

One side effect of this is unfortunately another change in the log
output: as we do not have separate timings for every phase of every
Reference type any more, the idea was to turn the logging inside-out:

Instead of this structure (at trace level):

 Reference Processing: 0.0ms
   SoftReference: 0.5ms
     Balance queues: 0.0ms
     Phase1: 0.2ms
       Process lists (ms)      Min:  0.0, Avg:  ... Workers: 28
     Phase2: 0.1ms
       Process lists (ms)      Min:  0.0, Avg:  ... Workers: 28
     Phase3: 0.2ms
       Process lists (ms)      Min:  0.0, Avg:  ... Workers: 28
     Discovered: 0
     Cleared: 0
   WeakReference: 0.3ms
     Balance queues: 0.0ms
     Phase2: 0.1ms
       Process lists (ms)      Min:  0.0, Avg:  ... Workers: 28
     Phase3: 0.2ms
       Process lists (ms)      Min:  0.0, Avg:  ... Workers: 28
     Discovered: 40
     Cleared: 28
   FinalReference: 0.3ms
     Balance queues: 0.0ms
     Phase2: 0.1ms
       Process lists (ms)      Min:  0.0, Avg:  ... Workers: 28
     Phase3: 0.2ms
       Process lists (ms)      Min:  0.0, Avg:  ... Workers: 28
     Discovered: 0
     Cleared: 0
   PhantomReference: 0.3ms
     Balance queues: 0.0ms
     Phase2: 0.1ms
       Process lists (ms)      Min:  0.0, Avg:  ... Workers: 28
     Phase3: 0.2ms
       Process lists (ms)      Min:  0.0, Avg:  ... Workers: 28
     Discovered: 6
     Cleared: 5

it will look like this:

 Reference Processing: 0.1ms
   Phase1: 0.0ms
     Balance queues: 0.0ms
     SoftRef (ms):             Min:  0.0, Avg:  ... Workers: 3
                                0.0  0.0  0.0 - ... - - - - - - (T)
   Phase2: 0.0ms
     Balance queues: 0.0ms
     SoftRef (ms):             Min:  0.0, Avg:  ... Workers: 3
                                0.0  0.0  0.0 - ... - - - - - - (T)
     WeakRef (ms):             Min:  0.0, Avg:  ... Workers: 3
                                0.0  0.0  0.0 - ... - - - - - - (T)
     FinalRef (ms):            Min:  0.0, Avg:  ... Workers: 3
                                0.0  0.0  0.0 - ... - - - - - - (T)
   Phase3: 0.0ms
     Balance queues: 0.0ms
     FinalRef (ms):            Min:  0.0, Avg:  ... Workers: 3
                                0.0  0.0  0.0 - ... - - - - - - (T)
   Phase4: 0.0ms
     Balance queues: 0.0ms
     PhantomRef (ms):          Min:  0.0, Avg:  ... Workers: 3
                                0.0  0.0  0.0 - ... - - - - - - (T)
   SoftReference:
     Discovered: 0
     Cleared: 0
   WeakReference:
     Discovered: 30
     Cleared: 30
   FinalReference:
     Discovered: 0
     Cleared: 0
   PhantomReference:
     Discovered: 5
     Cleared: 5

I.e. instead of showing for every reference type their phases with
timing, now the four phases with the reference timings that are
processed within that reference are shown.

Also (that has been a bug), the lines with a "(T)" are added at trace
level.

At debug level the timing output has been updated to be similar to
other gc+phases output, i.e. is now

 Reference Processing: 0.2ms
   Phase1: 0.0ms
     Balance queues: 0.0ms
     SoftRef (ms):             Min:  0.0, Avg:  ... Workers: 3
   Phase2: 0.1ms
     Balance queues: 0.0ms
     SoftRef (ms):             Min:  0.0, Avg:  ... Workers: 3
     WeakRef (ms):             Min:  0.0, Avg:  ... Workers: 3
     FinalRef (ms):            Min:  0.0, Avg:  ... Workers: 3
   Phase3: 0.0ms
     Balance queues: 0.0ms
     FinalRef (ms):            Min:  0.0, Avg:  ... Workers: 3
   Phase4: 0.0ms
     Balance queues: 0.0ms
     PhantomRef (ms):          Min:  0.0, Avg:  ... Workers: 3
   SoftReference:
     Discovered: 0
     Cleared: 0
   WeakReference:
     Discovered: 38
     Cleared: 30
   FinalReference:
     Discovered: 0
     Cleared: 0
   PhantomReference:
     Discovered: 7
     Cleared: 6

instead of the old output as follows:

 Reference Processing: 0.0ms
   SoftReference: 0.5ms
     Balance queues: 0.0ms
     Phase1: 0.2ms
     Phase2: 0.1ms
     Phase3: 0.2ms
     Discovered: 0
     Cleared: 0
   WeakReference: 0.3ms
     Balance queues: 0.0ms
     Phase2: 0.1ms
     Phase3: 0.2ms
     Discovered: 0
     Cleared: 0
   FinalReference: 0.3ms
     Balance queues: 0.0ms
     Phase2: 0.1ms
     Phase3: 0.2ms
     Discovered: 0
     Cleared: 0
   PhantomReference: 0.3ms
     Balance queues: 0.0ms
     Phase2: 0.1ms
     Phase3: 0.2ms
     Discovered: 5
     Cleared: 5

which did not show any per-thread information.

In case of using serial ref processing, this is the proposed output:

 Reference Processing: 0.1ms
   Phase1: 0.0ms
     SoftRef: 0.0ms
   Phase2: 0.0ms
     SoftRef: 0.0ms
     WeakRef: 0.0ms
     FinalRef: 0.0ms
   Phase3: 0.0ms
     FinalRef: 0.0ms
   Phase4: 0.0ms
     PhantomRef: 0.0ms
   SoftReference:
     Discovered: 0
     Cleared: 0
   WeakReference:
     Discovered: 30
     Cleared: 30
   FinalReference:
     Discovered: 0
     Cleared: 0
   PhantomReference:
     Discovered: 5
     Cleared: 5

Note that in the previous version, the output of parallel/debug was the
same as serial/debug.

Discovered/Cleared counts per reference type are shown afterwards
instead of within in all cases. Note that I think we should improve and
probably streamline the output of reference counts, but that is imho a
different issue (and this one is big enough as is).

This change builds on "JDK-8203319: JDK-8201487 disabled too much queue
balancing".

There is also a review of "JDK-8043575: Dynamically parallelize
reference processing work" out for review. I also talked to Sangheon
that I will do the merging as this change messes up the same code (in
the referenceProcessor* files) so much.

CR:
https://bugs.openjdk.java.net/browse/JDK-8202845
Webrev:
http://cr.openjdk.java.net/~tschatzl/8202845/webrev/
Testing:
hs-tier1-5,jdk-tier1-3

Thanks,
  Thomas