RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Tue Sep 19 15:32:04 UTC 2017

Hi,

We're moving forward with the review internally and doing some 
performance enhancements as well. Here are updated webrevs:
Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01/
Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00-01/

Note that the full webrev is based on the new consolidated repo, but the 
incremental was generated with the old structure.

Highlight in this update:
* Cleaned out unused code in PreservedMarks.
* Fixed memory leak in GenericTaskQueueSet.
* HeapRegionClaimerBase has been removed and instead we now have two 
functions to iterate through all heap regions.
* General cleanups and renames to ease understanding the code.
* G1 Hot Card Cache cleanup made parallel and moved into appropriate phase.
* Updated HeapRegion::apply_to_marked_objects to be a template function 
to avoid virtual call.

Thanks Erik D and Thomas S for all comments so far.

Cheers,
Stefan

On 2017-09-04 17:36, Stefan Johansson wrote:
> Hi,
>
> Please review the implementation of JEP-307:
> https://bugs.openjdk.java.net/browse/JDK-8172890
>
> Webrev:
> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00/
>
> Summary:
> As communicated late last year [1], I've been working on parallelizing 
> the Full GC for G1. The implementation is now ready for review.
>
> The approach I chose was to redo marking at the start of the Full GC 
> and not reuse the marking information from the concurrent mark cycle. 
> The main reason behind this is to maximize the chance of freeing up 
> memory. I reused the marking bitmap from the concurrent mark code 
> though, so instead of marking in the mark word a bitmap is used. The 
> mark word is still used for forwarding pointers, so marks will still 
> have to be preserved for some objects.
>
> The algorithm is still a four phased mark-compact but each phase is 
> handled by parallel workers. Marking and reference processing is done 
> in phase 1. In phase 2 all worker threads work through the heap 
> claiming regions which they prepare for compaction. This is done by 
> installing forwarding pointers into the mark word of the live objects 
> that will move. The regions claimed by a worker in this phase will be 
> the same regions that the worker will compact in phase 4. This ensures 
> that objects are not overwritten before compacted.
>
> In phase 3, all pointers to other objects are updated by looking at 
> the forwarding pointers. At this point all information needed to 
> create new remembered sets is available and this rebuilding has been 
> added to phase 3. In the old version remembered set rebuilding was 
> done separately after the compaction, but this is more efficient.
>
> As mentioned phase 4 is when the compaction is done. In this first 
> version, to avoid some complexity, there is no work stealing in this 
> phase. This will lead to some imbalance between the workers, but this 
> can be treated as a separate RFE in the future.
>
> The part of this work that has generated the most questions during 
> internal discussions are the serial parts of phase 2 and 4. They are 
> executed if no regions are to be freed up by the parallel workers. It 
> is kind of a safety mechanism to avoid throwing a premature OOM. In 
> the case of no regions being freed by the parallel code path a single 
> threaded pass over the last region of each worker is done (at most 
> number-of-workers regions are handled) to further compact these 
> regions and hopefully free up some regions.
>
> Testing:
> * A lot of local sanity testing, both functional and performance.
> * Passed tier 1-5 of internal testing on supported platforms.
> * No regressions in performance testing.
>
> Cheers,
> Stefan
>
> [1] 
> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-November/019216.html