RFR: Load balance remembered set scanning [v4]

Mon Aug 8 13:04:58 UTC 2022

On Mon, 8 Aug 2022 07:03:34 GMT, Y. Srinivas Ramakrishna <ysr at openjdk.org> wrote:

>> Kelvin Nilsen has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Fix white space
>
> src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp line 127:
> 
>> 125:       return;
>> 126:     }
>> 127: #endif
> 
> The crashes are likely because you are pulling an assignment off the worklist, and if you see a cancellation, you are dropping it on the floor and returning.
> 
> I think the check should be at the end of the loop after the assignment pulled off the work list has been processed but before the next one is pulled.
> 
> Here's a diff that should work:
> 
> 
> diff --git a/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp b/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp
> index 2d7cbbb97dd..03f647f23b2 100644
> --- a/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp
> +++ b/src/hotspot/share/gc/shenandoah/shenandoahScanRemembered.cpp
> @@ -81,15 +81,7 @@ void ShenandoahScanRememberedTask::do_work(uint worker_id) {
>    // set up thread local closure for shen ref processor
>    _rp->set_mark_closure(worker_id, &cl);
>    struct ShenandoahRegionChunk assignment;
> -  bool has_work = _work_list->next(&assignment);
> -  while (has_work) {
> -#ifdef ENABLE_REMEMBERED_SET_CANCELLATION
> -    // This check is currently disabled to avoid crashes that occur
> -    // when we try to cancel remembered set scanning
> -    if (heap->check_cancelled_gc_and_yield(_is_concurrent)) {
> -      return;
> -    }
> -#endif
> +  while (_work_list->next(&assignment)) {
>      ShenandoahHeapRegion* region = assignment._r;
>      log_debug(gc)("ShenandoahScanRememberedTask::do_work(%u), processing slice of region "
>                    SIZE_FORMAT " at offset " SIZE_FORMAT ", size: " SIZE_FORMAT,
> @@ -101,13 +93,15 @@ void ShenandoahScanRememberedTask::do_work(uint worker_id) {
>        assert(clusters * cluster_size == assignment._chunk_size, "Chunk assignments must align on cluster boundaries");
>        HeapWord* end_of_range = region->bottom() + assignment._chunk_offset + assignment._chunk_size;
>  
> -      // During concurrent mark, region->top() equals TAMS with respect to the current young-gen pass.  */
> +      // During concurrent mark, region->top() equals TAMS with respect to the current young-gen pass.
>        if (end_of_range > region->top()) {
>          end_of_range = region->top();
>        }
>        scanner->process_region_slice(region, assignment._chunk_offset, clusters, end_of_range, &cl, false, _is_concurrent);
>      }
> -    has_work = _work_list->next(&assignment);
> +    if (heap->check_cancelled_gc_and_yield(_is_concurrent)) {
> +      return;
> +    }
>    }
>  }

Thanks very much for figuring out this problem.  I'd like to address this change in a different pull request to keep the history cleaner.

-------------

PR: https://git.openjdk.org/shenandoah/pull/153