RFR: 8296419: [REDO] JDK-8295319: pending_cards_at_gc_start doesn't include cards in thread buffers
Thomas Schatzl
tschatzl at openjdk.org
Wed Nov 9 09:07:07 UTC 2022
On Wed, 9 Nov 2022 04:57:38 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
> Let's try this again. This is the original JDK-8295319 change, plus a couple
> of small additions to fix problems with that original change.
>
> The description of the original change was:
>
> -----
>
> Please review this change to G1 to include the per-thread buffers in the
> number of pending cards at the start of a young GC.
>
> DCQS::concatenate_logs has been renamed to concatenate_logs_and_stats, and now
> also merges the per-thread refinement stats during the thread walk to flush
> buffers. That replaces the separate thread walk to merge and record these
> stats earlier in the GC. The merged stats and related info don't seem to be
> needed until after the buffer flushing.
>
> Also, when abandoning dirty card buffers and stats because of a full GC, fixed
> to also abandon any buffers in the paused buffers lists.
>
> -----
>
> The problem was that by moving log concatenation earlier, it no longer
> followed the call to retire_tlabs(). Among other things, that function
> calls flush_deferred_card_mark_barrier() on each thread, possibly adding cards
> to the thread's dirty card queue. Doing that after the log concatenation may
> lead to unprocessed cards in thread queues, with chaos ensuing.
>
> To fix this, the call to retire_tlabs has also been moved, so that it once
> again precedes the log concatenation. Also added (debug-only) verification
> that all thread dirty card queues are empty at the end of
> pre_evacuate_collection_set.
>
> A possible followup is to refactor those two operations, which each do a
> (single-threaded) walk of all threads. We could combine that into a single
> walk. We could also parallelize it if that seems warranted, though the
> per-thread work is usually pretty small, so might not be worth parallelizing.
>
> There might be an opportunity to do some similar refactoring for fullgc and
> log abandonment, though there the two operations are done far away from each
> other.
>
> Testing:
> mach5 tier1-6
Changes requested by tschatzl (Reviewer).
src/hotspot/share/gc/g1/g1YoungCollector.cpp line 1094:
> 1092: // Flushes deferred card marks, so must precede concatenting logs.
> 1093: retire_tlabs();
> 1094:
I was wondering whether we should do something about all the methods (`retire_tlabs`, `concatenate_dirty_card_logs...`, `calculate_collection_set`) methods here that are part of the `pre_evacuate_collection_set` phase but are now located outside of that method.
Looking at the `per_thread_states` parameter passed to `pre_evacuate_collection_set`, it's only used for some verification inside `pre_evacuate_collection_set`, i.e. we could move that verification and initialization of the PSSS :) after`pre_evacuate_collection_set` instead.
Since the change touches that code, I think it is appropriate (and better) to consolidate right away.
I agree about that deferring combining and parallelizing the various phases of `pre_evacuate_collection_set` can be done extra; the reason why we want parallelization of the TLAB retiring is that there can be tens of thousands threads, and even little work per thread adds up (there is a CR for that already actually, [JDK-8211104](https://bugs.openjdk.org/browse/JDK-8211104)).
Other than that it looks good to me.
-------------
PR: https://git.openjdk.org/jdk/pull/11053
More information about the hotspot-gc-dev
mailing list