RFR: 8351091: Shenandoah: global marking context completeness is not accurately maintained [v4]

Thu Mar 6 22:57:07 UTC 2025

On Thu, 6 Mar 2025 18:34:43 GMT, Xiaolong Peng <xpeng at openjdk.org> wrote:

>> With the JEP 404: Generational Shenandoah implementation, there are generation specific marking completeness flags introduced, and the global marking context completeness flag is not updated at all after initialization, hence the global marking context completeness is not accurate anymore. This may cause expected behavior: [ShenandoahHeap::complete_marking_context()](https://github.com/openjdk/jdk/pull/23886/files#diff-d5ddf298c36b1c91bf33f9bff7bedcc063074edd68c298817f1fdf39d2ed970fL642) should throw assert error if the global marking context completeness flag is false, but now it always return the marking context even it marking is not complete, this may hide bugs where we expect the global/generational marking to be completed.
>> 
>> This change PR fix the bug in global marking context completeness flag, and update all the places using `ShenandoahHeap::complete_marking_context()` to use proper API.
>> 
>> ### Test
>> - [x] hotspot_gc_shenandoah
>> - [x] Tier 1
>> - [x]  Tier 2
>
> Xiaolong Peng has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - Remove ShenandoahHeap::complete_marking_context(ShenandoahHeapRegion* region)
>  - Revert "complete_marking_context should guarantee mark is complete"
>    
>    This reverts commit 2004973965ea0e617cf9e5fc45be24f0e06e90a1.
>  - complete_marking_context should guarantee mark is complete

A few more comments, mostly pertaining to global gen's "complete" marking context semantics and usage, as well as `SH::[*_]marking_context` delegating to its `active_generation()`'s method.

This should be my last round of comments. Thank you for your patience...

src/hotspot/share/gc/shenandoah/heuristics/shenandoahHeuristics.cpp line 123:

> 121: #ifdef ASSERT
> 122:       bool reg_live = region->has_live();
> 123:       bool bm_live = heap->active_generation()->complete_marking_context()->is_marked(cast_to_oop(region->bottom()));

Apropos of another comment, if we really want to keep a delegating method in `ShenandoahHeap`, why not use `heap->complete_marking_context()` as a synonym for `heap->active_generation()->complete_marking_context()` ?

src/hotspot/share/gc/shenandoah/shenandoahFullGC.cpp line 1:

> 1: /*

This all looks good.
One thing to think about in general about assertions in closures is whether instead of making use of knowledge of the context in which these closures are used, whether it may produce more mantianable code to embed the "active_generation" to which the closure is being applied in the closure itself and have the assertions (or other uses of context) use that instead. Nothing to be done now, but something to think about in making more maintainable code.

src/hotspot/share/gc/shenandoah/shenandoahGenerationalEvacuationTask.cpp line 172:

> 170: // contained herein.
> 171: void ShenandoahGenerationalEvacuationTask::promote_in_place(ShenandoahHeapRegion* region) {
> 172:   ShenandoahMarkingContext* const marking_context = _heap->young_generation()->complete_marking_context();

For clarity, you might assert the following before line 172:

    assert(gc_generation() == _heap->young_generation(), "Sanity check");

Even though it might seem somewhat tautological.

src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1:

> 1: /*

See previous suggestion/comments on `SH::[complete_]marking_context()` as delegating to that method of its `gc_generation()`.

What you have here sounds fine too, but a uniform usage of either keeping `SH::[complete_]marking_context()` or not at all makes more sense to me, and seems cleaner to me.

src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 1283:

> 1281:   if (_heap->gc_generation()->is_global()) {
> 1282:     return _heap->marking_context();
> 1283:   }

Not sure I understand the point of this change in behavior. What purpose does a partial marking context serve? Why not just leave the behavior as was before and return a non-null marking context only when marking is complete and null otherwise. When the client uses the context, it does so to skip over unmarked objects (which are dead if marking is complete), which might end up being too weak if we are still in the midst of marking.

I realize that you may not be maintaining a global mark completion so you are returning the marking context irrespective of the state of completion of the marking, but I wonder if that is really the bahavior you want. I would rather, as necessary, we maintain a flag for completion of global marking for the case where we are doing a global gc?

-------------

PR Review: https://git.openjdk.org/jdk/pull/23886#pullrequestreview-2665674435
PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984134275
PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984075136
PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984115881
PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984140131
PR Review Comment: https://git.openjdk.org/jdk/pull/23886#discussion_r1984108547