RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v16]
Thomas Schatzl
tschatzl at openjdk.java.net
Thu Mar 24 10:02:08 UTC 2022
On Tue, 22 Mar 2022 12:13:11 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>>
>> This patch tries to improve parallelism when walking over the regions in chunks
>>
>> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
>>
>> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
>>
>> The performance benefit trend is:
>> - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>> - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
>> ( Other common Evacuation Failure configurations are:
>> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
>>
>> For more detailed performance data, please check the related bug.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove unnecessary method
This looks good to me, with some final cleanup comments. Apologies for taking a bit.
src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 106:
> 104: assert(_evac_failure_regions->contains(hr->hrm_index()), "precondition");
> 105:
> 106: hr->clear_index_in_opt_cset();
One optional suggestion: Let's also move this into `HeapRegion::note_self_forwarding_removal_start`. It seems so lonely here for that trivial assignment, right now better fitting over there so that this preparation only deals with the more expensive remembered sets.
src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 114:
> 112:
> 113: hr->rem_set()->clean_code_roots(hr);
> 114: hr->rem_set()->clear_locked(true);
Suggestion:
HeapRegionRemSet* rem_set = hr->rem_set();
rem_set->clean_code_roots(hr);
rem_set->clear_locked(true);
Optional cleanup.
src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 71:
> 69: chunk_closure->do_heap_region_chunk(&chunk);
> 70: p->record_or_add_time_secs(G1GCPhaseTimes::RemoveSelfForwardsInChunks, worker_id, (Ticks::now() - start).seconds());
> 71: }
Suggestion:
G1CollectedHeap* glh = G1CollectedHeap::heap();
G1GCPhaseTimes* p = glh->phase_times();
// Prepare and analyze assigned chunk.
Ticks chunk_prepare_start = Ticks::now();
uint region_idx = _evac_failure_regions[chunk_id / _chunks_per_region];
G1HeapRegionChunk chunk(glh->region_at(region_idx), chunk_id % _chunks_per_region, _chunk_size, _bitmap);
p->record_or_add_time_secs(G1GCPhaseTimes::PrepareChunks, worker_id, (Ticks::now() - chunk_prepare_start).seconds());
if (chunk.empty()) {
p->record_or_add_thread_work_item(G1GCPhaseTimes::RemoveSelfForwardsInChunks, worker_id, 1, G1GCPhaseTimes::RemoveSelfForwardEmptyChunksNum);
return;
}
p->record_or_add_thread_work_item(G1GCPhaseTimes::RemoveSelfForwardsInChunks, worker_id, 1, G1GCPhaseTimes::RemoveSelfForwardChunksNum);
// Process the chunk.
Ticks start = Ticks::now();
chunk_closure->do_heap_region_chunk(&chunk);
p->record_or_add_time_secs(G1GCPhaseTimes::RemoveSelfForwardsInChunks, worker_id, (Ticks::now() - start).seconds());
}
That method looks so busy, I tried improving it by adding newlines.
src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 84:
> 82: log_debug(gc, ergo)("Running %s using %u workers for removing self forwards with %u chunks per region",
> 83: task_name, num_workers, _chunk_size);
> 84:
Suggestion:
log_debug(gc, ergo)("Initializing removing self forwards with %u chunks per region given %u workers", _chunk_size, num_workers);
Since we successfully moved this work into the `G1BatchedGangTask` API, I do not think we need the information about the task and #workers any more. The batch API provides this information afair.
This also allows removal of the `task_name` passed through.
src/hotspot/share/gc/g1/g1HeapRegionChunk.hpp line 35:
> 33: class HeapRegion;
> 34:
> 35: class G1HeapRegionChunk {
Suggestion:
// Unit of work for removing self forwards in regions.
class G1HeapRegionChunk {
-------------
Changes requested by tschatzl (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/7047
More information about the hotspot-gc-dev
mailing list