RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Wed Oct 18 13:49:42 UTC 2017

Looks good to me too.

Thanks,
Sangheon

On 10/18/2017 02:45 AM, Stefan Johansson wrote:
> Hi again,
>
> A lot more internal review has been done and here are the latest webrevs:
> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04/
> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.03-04/
> Incremental (from previous mail): 
> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01-04/
>
> Summary of changes:
> * Updated calculations for heap sizing after gc.
> * Removed G1FullGCWorkerData and moved data into G1FullCollector instead.
> * Renamed G1MarkStack to G1FullGCMarker.
> * Renamed G1CompactionPoint to G1FullGCCompactionPoint.
> * Removed now unused RebuildRSOopClosure and par_write_ref from G1RemSet.
> * Updated comments to be more informative.
> * Better naming of functions and variables.
> * Updated copyright for a lot of files.
>
> Big thanks to Erik D, Thomas S and Sangheon K for working your way 
> through this big change.
>
> Cheers,
> Stefan
>
> On 2017-09-19 17:32, Stefan Johansson wrote:
>> Hi,
>>
>> We're moving forward with the review internally and doing some 
>> performance enhancements as well. Here are updated webrevs:
>> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01/
>> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00-01/
>>
>> Note that the full webrev is based on the new consolidated repo, but 
>> the incremental was generated with the old structure.
>>
>> Highlight in this update:
>> * Cleaned out unused code in PreservedMarks.
>> * Fixed memory leak in GenericTaskQueueSet.
>> * HeapRegionClaimerBase has been removed and instead we now have two 
>> functions to iterate through all heap regions.
>> * General cleanups and renames to ease understanding the code.
>> * G1 Hot Card Cache cleanup made parallel and moved into appropriate 
>> phase.
>> * Updated HeapRegion::apply_to_marked_objects to be a template 
>> function to avoid virtual call.
>>
>> Thanks Erik D and Thomas S for all comments so far.
>>
>> Cheers,
>> Stefan
>>
>>
>> On 2017-09-04 17:36, Stefan Johansson wrote:
>>> Hi,
>>>
>>> Please review the implementation of JEP-307:
>>> https://bugs.openjdk.java.net/browse/JDK-8172890
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00/
>>>
>>> Summary:
>>> As communicated late last year [1], I've been working on 
>>> parallelizing the Full GC for G1. The implementation is now ready 
>>> for review.
>>>
>>> The approach I chose was to redo marking at the start of the Full GC 
>>> and not reuse the marking information from the concurrent mark 
>>> cycle. The main reason behind this is to maximize the chance of 
>>> freeing up memory. I reused the marking bitmap from the concurrent 
>>> mark code though, so instead of marking in the mark word a bitmap is 
>>> used. The mark word is still used for forwarding pointers, so marks 
>>> will still have to be preserved for some objects.
>>>
>>> The algorithm is still a four phased mark-compact but each phase is 
>>> handled by parallel workers. Marking and reference processing is 
>>> done in phase 1. In phase 2 all worker threads work through the heap 
>>> claiming regions which they prepare for compaction. This is done by 
>>> installing forwarding pointers into the mark word of the live 
>>> objects that will move. The regions claimed by a worker in this 
>>> phase will be the same regions that the worker will compact in phase 
>>> 4. This ensures that objects are not overwritten before compacted.
>>>
>>> In phase 3, all pointers to other objects are updated by looking at 
>>> the forwarding pointers. At this point all information needed to 
>>> create new remembered sets is available and this rebuilding has been 
>>> added to phase 3. In the old version remembered set rebuilding was 
>>> done separately after the compaction, but this is more efficient.
>>>
>>> As mentioned phase 4 is when the compaction is done. In this first 
>>> version, to avoid some complexity, there is no work stealing in this 
>>> phase. This will lead to some imbalance between the workers, but 
>>> this can be treated as a separate RFE in the future.
>>>
>>> The part of this work that has generated the most questions during 
>>> internal discussions are the serial parts of phase 2 and 4. They are 
>>> executed if no regions are to be freed up by the parallel workers. 
>>> It is kind of a safety mechanism to avoid throwing a premature OOM. 
>>> In the case of no regions being freed by the parallel code path a 
>>> single threaded pass over the last region of each worker is done (at 
>>> most number-of-workers regions are handled) to further compact these 
>>> regions and hopefully free up some regions.
>>>
>>> Testing:
>>> * A lot of local sanity testing, both functional and performance.
>>> * Passed tier 1-5 of internal testing on supported platforms.
>>> * No regressions in performance testing.
>>>
>>> Cheers,
>>> Stefan
>>>
>>> [1] 
>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-November/019216.html
>>
>