RFR: 8300915: G1: incomplete SATB because nmethod entry barriers don't get armed [v4]
Richard Reingruber
rrich at openjdk.org
Thu Jan 26 11:23:30 UTC 2023
On Thu, 26 Jan 2023 10:23:48 GMT, Erik Ă–sterlund <eosterlund at openjdk.org> wrote:
> Hmm. Some thoughts. The way I look at it, we have essentially 4 different
> endings to CM when we poke it to "start". 1) successful finish, 2) undone end
> (no CM needed in the end), 3) aborted by STW full GC, 4) aborted by new CM
> request. If I understand this patch correctly, it tries to recognize case 2 as
> essentiallly being like 1, in the sense that we have traversed the full heap
> and we are done. Is that correct?
Well it can be seen like this.
I wasn't aware of 4). Does it really exist? The assertion in `CodeCache::on_gc_marking_cycle_start()` should have fired here:
CodeCache::on_gc_marking_cycle_start() : void
G1CollectedHeap::start_codecache_marking_cycle() : void
G1ConcurrentMark::pre_concurrent_start(enum GCCause::Cause) : void
G1YoungCollector::pre_evacuate_collection_set(G1EvacInfo *) : void
G1YoungCollector::collect() : void
I havn't seen it so far though.
Maybe you mean the restart of marking in `G1ConcurrentMarkThread::phase_mark_loop()`?
This is not problematic I think.
> In that case, I think there is still the same issue for case 4. If a CM is
> aborted because we are interrupting an old CM in order to start a new CM, then
> we still have a case where we have an old marking cycle that has finished, but
> not successfully. We can't call
> G1CollectedHeap::finish_codecache_marking_cycle() until we have successfully
> marked through all objects in the heap. If we were to call that when an old CM
> is aborted, then is_unloading() might start returning true on nmethods that
> were on-stack in some loom continuation object in the old generation. This was
> the main point of only conditionally starting cycles - then the epoch counters
> being marked in the nmethods would be conservative and potentially keep
> nmethods around a bit longer that really could be discarded.
That's a very good point I didn't see and didn't get it from the comment I removed.
On the other hand: do we ever call is_unloading() on nmethods without a complete marking cycle?
> Naturally, what I didn't consider at that time, was that not arming all
> nmethods at the latest initiating marking operation was an SATB violation,
> when we rely on nmethod entry barriers enforcing SATB constraints. At the time
> when that code was written, we thought that G1 didn't need that, but then it
> turned out that it did. Having said all that, I think what I'm getting at is
> that this patch fixes the SATB problem, but introduces a new premature nmethod
> unloading condition for the loom code instead. It's like we want to have the
> cake and eat it too. What would of course be super nice is if we didn't abort
> CMs to start new CMs. I'm not entirely sure why that's a thing in G1. Anyway,
> if we want to keep on doing that, then I think what we need to do is to
> explicitly recognize when case 4 happens, and then walk the code cache, mark
> all nmethods as maybe_on_stack (because the truth is that we don't really know
> if the GC never got to finish successfully), and then call
> G1CollectedHeap::finish_codecache_marking_cycle(). Then we can close the gap
> and solve both the SATB issue and the loom nmethod unloading problems at the
> same time. Hope this makes sense, and feel free to ask if something isn't
> clear in my description. And thanks for looking into this.
I understand that cases 2) and 4) are might be problematic because of loom (nmethod
frames on the java heap). Adhoc though: can't we just undo the code cache cycle
be decrementing `CodeCache::_gc_epoch`
But maybe it's not really an issue because we don't do unloading without a complete cycle...?
-------------
PR: https://git.openjdk.org/jdk/pull/12194
More information about the hotspot-gc-dev
mailing list