RFR: 8296419: [REDO] JDK-8295319: pending_cards_at_gc_start doesn't include cards in thread buffers
Thomas Schatzl
tschatzl at openjdk.org
Tue Nov 15 14:24:16 UTC 2022
On Wed, 9 Nov 2022 08:59:55 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
>> Let's try this again. This is the original JDK-8295319 change, plus a couple
>> of small additions to fix problems with that original change.
>>
>> The description of the original change was:
>>
>> -----
>>
>> Please review this change to G1 to include the per-thread buffers in the
>> number of pending cards at the start of a young GC.
>>
>> DCQS::concatenate_logs has been renamed to concatenate_logs_and_stats, and now
>> also merges the per-thread refinement stats during the thread walk to flush
>> buffers. That replaces the separate thread walk to merge and record these
>> stats earlier in the GC. The merged stats and related info don't seem to be
>> needed until after the buffer flushing.
>>
>> Also, when abandoning dirty card buffers and stats because of a full GC, fixed
>> to also abandon any buffers in the paused buffers lists.
>>
>> -----
>>
>> The problem was that by moving log concatenation earlier, it no longer
>> followed the call to retire_tlabs(). Among other things, that function
>> calls flush_deferred_card_mark_barrier() on each thread, possibly adding cards
>> to the thread's dirty card queue. Doing that after the log concatenation may
>> lead to unprocessed cards in thread queues, with chaos ensuing.
>>
>> To fix this, the call to retire_tlabs has also been moved, so that it once
>> again precedes the log concatenation. Also added (debug-only) verification
>> that all thread dirty card queues are empty at the end of
>> pre_evacuate_collection_set.
>>
>> A possible followup is to refactor those two operations, which each do a
>> (single-threaded) walk of all threads. We could combine that into a single
>> walk. We could also parallelize it if that seems warranted, though the
>> per-thread work is usually pretty small, so might not be worth parallelizing.
>>
>> There might be an opportunity to do some similar refactoring for fullgc and
>> log abandonment, though there the two operations are done far away from each
>> other.
>>
>> Testing:
>> mach5 tier1-6
>
> src/hotspot/share/gc/g1/g1YoungCollector.cpp line 1094:
>
>> 1092: // Flushes deferred card marks, so must precede concatenting logs.
>> 1093: retire_tlabs();
>> 1094:
>
> I was wondering whether we should do something about all the methods (`retire_tlabs`, `concatenate_dirty_card_logs...`, `calculate_collection_set`) methods here that are part of the `pre_evacuate_collection_set` phase but are now located outside of that method.
>
> Looking at the `per_thread_states` parameter passed to `pre_evacuate_collection_set`, it's only used for some verification inside `pre_evacuate_collection_set`, i.e. we could move that verification and initialization of the PSSS :) after`pre_evacuate_collection_set` instead.
>
> Since the change touches that code, I think it is appropriate (and better) to consolidate right away.
>
> I agree about that deferring combining and parallelizing the various phases of `pre_evacuate_collection_set` can be done extra; the reason why we want parallelization of the TLAB retiring is that there can be tens of thousands threads, and even little work per thread adds up (there is a CR for that already actually, [JDK-8211104](https://bugs.openjdk.org/browse/JDK-8211104)).
>
> Other than that it looks good to me.
I made a prototype for this suggestion: https://github.com/openjdk/jdk/commit/587660c8ee198447ea82ddd61dc9d41f2c559daa that seems to at least compile. No testing. One may also move `rdcqs` and `preserved_marks_set` completely into the `G1ParScanThreadStateSet` as neither is used elsewhere.
-------------
PR: https://git.openjdk.org/jdk/pull/11053
More information about the hotspot-gc-dev
mailing list