RFR: 8256265: G1: Improve parallelism in regions that failed evacuation [v16]

Thomas Schatzl tschatzl at openjdk.java.net
Thu Mar 24 10:02:08 UTC 2022


On Tue, 22 Mar 2022 12:13:11 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.
>> 
>> This patch tries to improve parallelism when walking over the regions in chunks
>> 
>> Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.
>> 
>> Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)
>> 
>> The performance benefit trend is:
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
>>  - pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
>> ( Other common Evacuation Failure configurations are:
>> -XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )
>> 
>> For more detailed performance data, please check the related bug.
>
> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Remove unnecessary method

This looks good to me, with some final cleanup comments. Apologies for taking a bit.

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 106:

> 104:       assert(_evac_failure_regions->contains(hr->hrm_index()), "precondition");
> 105: 
> 106:       hr->clear_index_in_opt_cset();

One optional suggestion: Let's also move this into `HeapRegion::note_self_forwarding_removal_start`. It seems so lonely here for that trivial assignment, right now better fitting over there so that this preparation only deals with the more expensive remembered sets.

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp line 114:

> 112: 
> 113:       hr->rem_set()->clean_code_roots(hr);
> 114:       hr->rem_set()->clear_locked(true);

Suggestion:

      HeapRegionRemSet* rem_set = hr->rem_set();
      rem_set->clean_code_roots(hr);
      rem_set->clear_locked(true);

Optional cleanup.

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 71:

> 69:   chunk_closure->do_heap_region_chunk(&chunk);
> 70:   p->record_or_add_time_secs(G1GCPhaseTimes::RemoveSelfForwardsInChunks, worker_id, (Ticks::now() - start).seconds());
> 71: }

Suggestion:

  G1CollectedHeap* glh = G1CollectedHeap::heap();
  G1GCPhaseTimes* p = glh->phase_times();

  // Prepare and analyze assigned chunk.
  Ticks chunk_prepare_start = Ticks::now();
  uint region_idx = _evac_failure_regions[chunk_id / _chunks_per_region];
  G1HeapRegionChunk chunk(glh->region_at(region_idx), chunk_id % _chunks_per_region, _chunk_size, _bitmap);
  p->record_or_add_time_secs(G1GCPhaseTimes::PrepareChunks, worker_id, (Ticks::now() - chunk_prepare_start).seconds());

  if (chunk.empty()) {
    p->record_or_add_thread_work_item(G1GCPhaseTimes::RemoveSelfForwardsInChunks, worker_id, 1, G1GCPhaseTimes::RemoveSelfForwardEmptyChunksNum);
    return;
  }
  p->record_or_add_thread_work_item(G1GCPhaseTimes::RemoveSelfForwardsInChunks, worker_id, 1, G1GCPhaseTimes::RemoveSelfForwardChunksNum);

  // Process the chunk.
  Ticks start = Ticks::now();
  chunk_closure->do_heap_region_chunk(&chunk);
  p->record_or_add_time_secs(G1GCPhaseTimes::RemoveSelfForwardsInChunks, worker_id, (Ticks::now() - start).seconds());
}

That method looks so busy, I tried improving it by adding newlines.

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp line 84:

> 82:   log_debug(gc, ergo)("Running %s using %u workers for removing self forwards with %u chunks per region",
> 83:                       task_name, num_workers, _chunk_size);
> 84: 

Suggestion:

  log_debug(gc, ergo)("Initializing removing self forwards with %u chunks per region given %u workers", _chunk_size, num_workers);


Since we successfully moved this work into the `G1BatchedGangTask` API, I do not think we need the information about the task and #workers any more. The batch API provides this information afair.
This also allows removal of the `task_name` passed through.

src/hotspot/share/gc/g1/g1HeapRegionChunk.hpp line 35:

> 33: class HeapRegion;
> 34: 
> 35: class G1HeapRegionChunk {

Suggestion:

// Unit of work for removing self forwards in regions.
class G1HeapRegionChunk {

-------------

Changes requested by tschatzl (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7047



More information about the hotspot-gc-dev mailing list