RFR: 8296419: [REDO] JDK-8295319: pending_cards_at_gc_start doesn't include cards in thread buffers

Thomas Schatzl tschatzl at openjdk.org
Wed Nov 9 09:07:07 UTC 2022


On Wed, 9 Nov 2022 04:57:38 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

> Let's try this again.  This is the original JDK-8295319 change, plus a couple
> of small additions to fix problems with that original change.
> 
> The description of the original change was:
> 
> -----
> 
> Please review this change to G1 to include the per-thread buffers in the
> number of pending cards at the start of a young GC.
> 
> DCQS::concatenate_logs has been renamed to concatenate_logs_and_stats, and now
> also merges the per-thread refinement stats during the thread walk to flush
> buffers. That replaces the separate thread walk to merge and record these
> stats earlier in the GC. The merged stats and related info don't seem to be
> needed until after the buffer flushing.
> 
> Also, when abandoning dirty card buffers and stats because of a full GC, fixed
> to also abandon any buffers in the paused buffers lists.
> 
> -----
> 
> The problem was that by moving log concatenation earlier, it no longer
> followed the call to retire_tlabs().  Among other things, that function
> calls flush_deferred_card_mark_barrier() on each thread, possibly adding cards
> to the thread's dirty card queue.  Doing that after the log concatenation may
> lead to unprocessed cards in thread queues, with chaos ensuing.
> 
> To fix this, the call to retire_tlabs has also been moved, so that it once
> again precedes the log concatenation.  Also added (debug-only) verification
> that all thread dirty card queues are empty at the end of
> pre_evacuate_collection_set.
> 
> A possible followup is to refactor those two operations, which each do a
> (single-threaded) walk of all threads.  We could combine that into a single
> walk.  We could also parallelize it if that seems warranted, though the
> per-thread work is usually pretty small, so might not be worth parallelizing.
> 
> There might be an opportunity to do some similar refactoring for fullgc and
> log abandonment, though there the two operations are done far away from each
> other.
> 
> Testing:
> mach5 tier1-6

Changes requested by tschatzl (Reviewer).

src/hotspot/share/gc/g1/g1YoungCollector.cpp line 1094:

> 1092:     // Flushes deferred card marks, so must precede concatenting logs.
> 1093:     retire_tlabs();
> 1094: 

I was wondering whether we should do something about all the methods (`retire_tlabs`, `concatenate_dirty_card_logs...`, `calculate_collection_set`) methods here that are part of the `pre_evacuate_collection_set` phase but are now located outside of that method.

Looking at the `per_thread_states` parameter passed to `pre_evacuate_collection_set`, it's only used for some verification inside `pre_evacuate_collection_set`, i.e. we could move that verification and initialization of the PSSS :) after`pre_evacuate_collection_set` instead.

Since the change touches that code, I think it is appropriate (and better) to consolidate right away.

I agree about that deferring combining and parallelizing the various phases of `pre_evacuate_collection_set` can be done extra; the reason why we want parallelization of the TLAB retiring is that there can be tens of thousands threads, and even little work per thread adds up (there is a CR for that already actually, [JDK-8211104](https://bugs.openjdk.org/browse/JDK-8211104)).

Other than that it looks good to me.

-------------

PR: https://git.openjdk.org/jdk/pull/11053


More information about the hotspot-gc-dev mailing list