RFR: Allow young collection to suspend marking in old generation

earthling-amzn github.com+71722661+earthling-amzn at openjdk.java.net
Mon Feb 22 16:59:05 UTC 2021


On Mon, 22 Feb 2021 10:22:49 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

>> **This is a work in progress.**
>> 
>> ## Summary of changes
>> The goal of these changes is to allow young cycles to suspend marking in the old generation. When the young cycle is complete, the concurrent old marking will resume.
>> * Evaluation of heuristics was pulled into a new `ShenandoahRegulatorThread`. Taking this evaluation out of line from the control thread allows the regulator to suspend (using the cancellation mechanism) old generation marking and start a young generation cycle.
>> * Task queues for marking have been moved from `ShenandoahMarkingContext` into `ShenandoahGeneration`. This allows the marking state for the old generation to persist across young generation cycles. The associated `is_complete` state has also been moved to `ShenandoahGeneration`.
>> * Old generation marking is bootstrapped by a complete young generation cycle. In this scenario, the mark closures for a young cycle are given reference to the young generation mark queues _and_ the old generation mark queues. Rather than ignore old references as is done for a normal young cycle, they are enqueued in the old generation mark queues. When the young cycle completes, the old generation marking continues using the task queues primed by the preceding young cycle.
>> * There is a new flag: `ShenandoahAllowOldMarkingPreemption` (defaults to true). Disabling this option will cause the regulator to schedule  `young` or `global` collects (according to heuristics), but will _not_ schedule `old` collects.
>> * The `global` generation is used to support satb and incremental-update modes. The `global` generation is also used for degenerated and implicit/explicit gc requests. Degenerated cycles are not working on this branch and the root cause is understood.
>> * The `global` generation is also used for `FullGC`, but this is also broken. The `FullGC` doesn't update the remembered set during compaction. We reckon there is a non-trivial amount of work to fix this.
>> * The `MARKING` gc state has been split into `YOUNG_MARKING` and `OLD_MARKING`.
>> * Immediate garbage collection is broken in generational mode. The update references phase is used to repair the remembered set, so when this phase is skipped the remembered set scan runs into trouble. A fix for this is in progress.
>> * The remembered set is scanned on a safepoint during initial mark. Work to make this concurrent is in progress.
>
> First priority for me is that hotspot_gc_shenandoah should be passing. I posted some fixes here: https://github.com/earthling-amzn/shenandoah/pull/1 - integrate them and they should show up in this PR.
> 
> There is one remaining test failing: make run-test TEST=gc/shenandoah/TestObjItrWithHeapDump.java
> It looks like the new GC regulator thread is not scheduling a full GC on System.gc(). This probably needs some work.
> 
> In general, I am very skeptical about introducing *another* thread to schedule GC stuff. The coordination between Java threads and the control thread is already quite complicated (we had a bunch of deadlock bugs in the past), and introducing another one here seems to make things even worse.
> 
> I haven't looked into the details yet, but suspending old marking for a young cycle also seems quite complicated in terms of coordination. This would require a sort of safepointing mechanism for GC threads, or maybe it is possible to use the SuspendibleThreadSet stuff for that. Can you explain which approach you have taken there?

I'll pull the fixes into this PR shortly, I want to first address your comments on adding a thread and coordinating young and old cycles.

The control thread is still in charge of scheduling. It still handles `System.gc` and allocation failures. The only thing in the new regulator thread is evaluation of heuristics. If the heuristics want to start a cycle, they make a request to the control thread. In a way, this makes the heuristics a peer of other threads that may start a gc. The motivation here is to allow the regulator to request  a young gc while the control thread is busy with an old gc.

The coordination between young and old gc cycles is very coarse and I think the existing cancellation mechanism is largely sufficient. Old generation gc is not allowed to cancel a young gc. An old generation gc begins by first running a young gc cycle with the mark closures configured to enqueue (but not follow) old objects into the old generation mark queues. This bootstrap young gc cycle runs to completion before any of the old gen task queues are serviced.

Once the young cycle has completed, the old generation runs the rest of the marking phase from the old gen task queues primed during the bootstrap young cycle. At this point in development, the old generation does not create a collection set (this is planned). If the regulator thread discovers that the young generation heuristic wants to start a gc cycle, it will schedule it with the control thread. The code path here is similar to that of an allocation failure. The regulator thread will request _cancellation_ of the concurrent old generation mark except it will _not_ clear the old generation task queues and it will not deactivate the SATB barrier. Once cancelled, the control thread runs a normal (i.e., non-bootstrapping) young cycle to completion before the control thread resumes the concurrent old marking.

-------------

PR: https://git.openjdk.java.net/shenandoah/pull/19


More information about the shenandoah-dev mailing list