RFR: 8254739: G1: Optimize evacuation failure for regions with few failed objects [v2]

Thomas Schatzl tschatzl at openjdk.java.net
Wed Aug 25 11:16:22 UTC 2021


On Wed, 25 Aug 2021 09:51:50 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> This is a try to optimize evcuation failure for regions.
>> I record every evacuation failure object per region (by G1EvacuationFailureObjsInHR), and then iterate (which indeed includes compact/sort/iteration) these objects directly in RemoveSelfForwardPtrHRClosure.
>> 
>> I have tested it with following parameters, 
>> 
>> -   -XX:+ParallelGCThreads=1/32/64
>> -   -XX:G1EvacuationFailureALotInterval=1
>> -   -XX:G1EvacuationFailureALotCount=2/10/100/1000/10000/100000
>> 
>> It saves "Remove Self Forwards" time all the time ,and in most condition it saves "Evacuate Collection Set" time. 
>> 
>> It brings some performance degradation when -XX:G1EvacuationFailureALotCount is low, such as *2*. To improve this a little, we can record the number evacuation failure object per region, and not record these objects when the number hit some limit. But I'm not sure if it's necessary to do so, as I think such condition is so extreme to be met in real environment, although I'm not quite sure.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Use a segmented array rather than a linked list to record evacuation failure objs in one region

Hi,

> > One problem I can see now with performance is the use of a linked list for storing the regions. Using such requires quite a lot of memory allocation (every `Node` object takes twice the needed data, i.e. the reference plus `next` link, plus low-level malloc overhead), and obviously traversing all links is kind of slow.
> > Since the only thing that needs to be done concurrently (and fast) is adding an element, an option could be something like a segmented array of HeapWords (per region?), i.e. a list of arrays of elements. (Further, since this is per `HeapRegion` the memory cost could be cut in half easily by using 32 bit offsets). That would probably also be faster when "compacting" all these failed arrays into a single array, and faster when deallocating (just a few of these arrays).
> > Something like `G1CardSetAllocator` with an element type of `G1CardSetArray`or the dirty card queue set allocation (`BufferNode::Allocator` for the allocation only part); unfortunately the code isn't that generic to be used as is for you.
> 
> I have written a simple version of the "array" for this specific usage, I do not implement a buffer to cache the "node" used in "array". Seems it's getting better performance than my previous linked list version. ( I measure the end-to-end time simply)
> Maybe later I could make it more generic or merge with the card-set one? If it's feasible, should we do it in a separate issue?

Yes, we can merge this later; there is the related [JDK-8267834: Refactor G1CardSetAllocator and BufferNode::Allocator to use a common base class](https://bugs.openjdk.java.net/browse/JDK-8267834) that may provide some useful base.

> 
> > I am also not really convinced that the failed object lists should be attached to `HeapRegion`, I would rather prefer an extra array as this is something that is usually not needed (and only required for regions with failure, which are very few typically), and only needed during young gc. (I intend to add something like a container for `HeapRegion` specific data that is only needed and used during gc at some point, moving similar existing misplaced data out of `HeapRegion` there then).
> 
> I agree. We could refactor it thoroughly. Could we do it in a separate issue? please kindly point to the issue if you already track this with an issue.

I am currently working on separating the G1 young collection algorithm from `G1CollectedHeap` in [JDK-8253343: Extract G1 Young GC algorithm related code from G1CollectedHeap](https://bugs.openjdk.java.net/browse/JDK-8253343) prototype available [here](https://github.com/tschatzl/jdk/tree/submit/8253343-move-young-collector-to-separate-files)), after that I intend to try to localize young-collection only data structures as suggested. I filed [8272978: Factor out g1 young collection specific data structures](https://bugs.openjdk.java.net/browse/JDK-8272978) where a note about this data structure could be added.

> 
> > Overall we can add some cut-off for whole-region iteration at some point as you described as needed later I think.
> 
> yes.

I filed JDK-8272977 to not forget on this.

> 
> > Do you have a breakdown of the time taken for the parts of the `G1EvacuationFailureObjsInHR::iterate()` method to see where where most time is spent? I.e. what overhead does the linked list impose compared to actual "useful" work? Do you have an idea about the total memory overhead? I can imagine that can be significant in this case.
> 
> I will measure the time of the new implementation with "array", and update the information later.

Thanks,
  Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/5181



More information about the hotspot-gc-dev mailing list