RFR: 8306334: Handle preemption of old cycle between filling and bootstrap phases [v2]
Y. Srinivas Ramakrishna
ysr at openjdk.org
Thu Apr 20 21:44:21 UTC 2023
On Thu, 20 Apr 2023 16:38:41 GMT, William Kemper <wkemper at openjdk.org> wrote:
>> In the case when a request to run a young cycle arrives _after_ making the old generation parseable but _before_ disallowing preemption, the preemption request would cause the young bootstrap cycle to be preempted (not cancelled). This is not expected and would result in the old marking phase beginning with stale oops in the young mark queues after the mark bitmaps were cleared. This ultimately triggers assertions that such objects should be marked.
>>
>> With this change if we detect a cancellation between filling and bootstrapping we make a distinction between:
>> * A cancellation due to allocation failure. In this case the old cycle is interrupted to run a degenerated cycle.
>> * A cancellation due to a young cycle request. In this case, the old cycle is allowed to continue with the boostrap cycle, which is itself a concurrent young cycle.
>
> William Kemper has updated the pull request incrementally with one additional commit since the last revision:
>
> Re-enable TestArrayCopyStress
I can't say I fully understand the interactions of states and transitions to do justice to an actual review, but the changes look good to me. Thanks for the documentation comment noting the reason for the check and short-circuit without entering the bootstrap cycle, which may be resumed later.
I left a few comments that might be useful, but they are more in the nature of questions from this newbie reader of this code, rather than any suggestions.
src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 536:
> 534: }
> 535: case ShenandoahOldGeneration::BOOTSTRAPPING: {
> 536: // It is possible for a young generation request to preempt this nascent old
This comment is helpful.
There is an ASCII art of an intended state transition diagram for the old generation states in `shenandoahOldGeneration.cpp`. It would be a good idea to update that diagram with possible premptible states and also the state transitions that might occur as a result of cancellations that might occur outside of the pre-emptible states. (I assume that cancellations can only interrupt and reset us out of pre-emptible states, not from any other, i.e. non-pre-emptible state.)
Similarly, at lines 444-458 above is another, presumably for Controller thread states. Does that figure get new transitions as a result of this change?
I see this documentation comment further below at lines 607-610:
// We can only tolerate being cancelled during concurrent marking or during preparation for mixed
// evacuation. This flag here (passed by reference) is used to control precisely where the regulator
// is allowed to cancel a GC.
ShenandoahOldGC gc(generation, _allow_old_preemption);
The comment says "cancel", but uses a flag `_allow_old_preemption`. When I first saw this I was a bit confused, until I realized that cancellations will be see only in the pre-emptible states, and ignored in all other states until we have completed the work for that state and transition out of it, or if we are otherwise in a pre-emptible state.
Your description below hints at these interactions, I think, but it would be great if this is documented clearly somewhere (may be it is, somewhere in either old generation or in the control thread). It would also be nice to explain, in case this makes sense, if there any possible correspondences between states of the control thread and those of the old generation state.
src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 541:
> 539: // have been preempted before we start a bootstrap cycle.
> 540: if (check_cancellation_or_degen(ShenandoahGC::_degenerated_outside_cycle)) {
> 541: if (heap->cancelled_gc()) {
In most of the other states where cancellation is checked, we only check `heap->cancelled_gc()`, but here we first `check_cancellation_or_degen()` before checking `cancelled_gc()`.
Now that method is below, where I have annotated some comments in between lines of code (github doesn't allow me to comment on lines not touched in a PR).
bool ShenandoahControlThread::check_cancellation_or_degen(ShenandoahGC::ShenandoahDegenPoint point) {
ShenandoahHeap* heap = ShenandoahHeap::heap();
if (!heap->cancelled_gc()) {
return false;
}
In other words, if `check_cancellation_..` answers true, it must be be that `cancelled_gc()` is true already so the test at line 41 is redundant.
However, there are more circumstances in which `check_cancellation_..` will answer true below.
if (in_graceful_shutdown()) {
return true;
}
assert(_degen_point == ShenandoahGC::_degenerated_outside_cycle,
"Should not be set yet: %s", ShenandoahGC::degen_point_to_string(_degen_point));
if (is_alloc_failure_gc()) {
_degen_point = point;
return true;
}
The assignment of `_degen_point = point` seems to be equivalent to assigning `_degen_point = ShenandoahGC::_degenerated_outside_cycle`, because of the assertion further above. So the comment in the assertion and the assignment further below look a bit fishy to me; as if code changed and these code paths may nt have been updated? Anyway, it seemed a bit confusing; such an assignment also happens further down in this method (code below).
if (_preemption_requested.is_set()) {
assert(_requested_generation == YOUNG, "Only young GCs may preempt old.");
_preemption_requested.unset();
// Old generation marking is only cancellable during concurrent marking.
// Once final mark is complete, the code does not check again for cancellation.
// If old generation was cancelled for an allocation failure, we wouldn't
// make it to this case. The calling code is responsible for forcing a
// cancellation due to allocation failure into a degenerated cycle.
_degen_point = point;
heap->clear_cancelled_gc(false /* clear oom handler */);
return true;
}
I see that above we clear the cancellation, so I can see the case where `check_cancellation_` would answer true, but a subsequent `cancelled_gc` would not.
fatal("Cancel GC either for alloc failure GC, or gracefully exiting, or to pause old generation marking");
return false;
}
Anyway, I thought I'd point this out, although I don't understand all of the various interactions here well enough to do a real review.
-------------
PR Review: https://git.openjdk.org/shenandoah/pull/262#pullrequestreview-1394696958
PR Review Comment: https://git.openjdk.org/shenandoah/pull/262#discussion_r1173048288
PR Review Comment: https://git.openjdk.org/shenandoah/pull/262#discussion_r1173116455
More information about the shenandoah-dev
mailing list